Kevin Thornton
1/17/17
HPC
For overview/docs
You can install your own modules!!!!
($USER == your user name!)
This is advanced. You need to know things:
Feel free to email me if you want to do this
#Get a list of what queues you have
#access to & what is free
q
This means you are limited to 512/64 = 8 or 264/64 = 4 Gb of RAM per job.
HPC does not enforce this. You have to know how much RAM your jobs take!
#!/bin/bash
#$ -q bio
#Output will be time (seconds) and peak RAM
#use (in bytes). Divide by 1024^2 to get Gb.
/usr/bin/time -f "%e %M" -o memtime.txt command params
If a single task takes more than 8Gb RAM, request more cores as needed:
#$ -pe openmp 2 #for 8-16Gb needed
#$ -pe openmp 3 #for 15-24Gb needed
Do not request 64 cores “just because”*
If your job will consume all RAM on the 256Gb nodes, simply avoid them:
#Only run on nodes with 512Gb RAM
#$ -l mem_size=512
#!/bin/bash
#request anywhere from 16-64 cores
#$ -pe openmp 16-64
#The no. cores given gets assigned
#automagically to CORES
program --nthreads $CORES
Do not test on HPC.
It is a shared resource.
Test on your own systems.
Yes, that may be tricky, but too bad.
Probably the most common type.
#!/bin/bash
#$ -q krt,krti,bio,pub64
cd $SGE_O_WORKDIR
module load foo
foo infile outfile
Great for repetitive tasks.
#!/bin/bash
#$ -q bio,abio,free64,pub64
#$ -t 1-1000
cd $SGE_O_WORKDIR
module load krthornt/anaconda/3
SEED=`echo "$SGE_TASK_ID*$RANDOM"|bc -l`
SEED2=`echo "$SEED*$RANDOM"|bc -l`
SEED3=`echo "$SEED2*$SGE_TASK_ID*$RANDOM"|bc -l`
mspms 100 1 -t 1000 -r 1000 10000 | gzip > mspms.$SGE_TASK_ID.out.gz
#!/bin/bash
#replace 100 with no. lines in commands.txt
#$ -t 1-100
`head -n $SGE_TASK_ID commands.txt | tail -n 1`
qsub -N JOB1NAME job1.sh
qsub -N JOB2NAME -hold_ijd JOB1NAME job1.sh
Job 2 will wait in the queue until job 1 completes!
Example script:
#!/bin/bash
for i in $(ls *.sh | grep -i jobs)
do
qsub $i
done
Example script
#!/bin/bash
qsub -N STEP1 step1.sh
qsub -N STEP2 -hold_jid STEP1 step2.sh
qsub -N STEP3 -hold_jid STEP2 step3.sh
A “rude” script:
#!/bin/bash
#This job submits to free
#queues but will pause if
#priority user comes along
#$ -q bio,abio,free64
#$ -q pe openmp 32
mycommand
Restart
Checkpointing
Fixing the “rude” script:
#!/bin/bash
#$ -q bio,abio,free64
#$ -q pe openmp 32
#$ -ckpt restart
mycommand
Use case:
Fixing the “rude” script:
#!/bin/bash
#$ -q bio,abio,free64
#$ -q pe openmp 32
#$ -ckpt blcr
mycommand
Use case:
Example 1
In file “commands.txt”:
ls -lhrt > out1
ls -lhrS > out2
To execute those commands in parallel (asynchronously):
parallel :::: commands.txt
Let's revisit this using GNU parallel:
#!/bin/bash
#$ -q bio,abio,free64,pub64
#$ -t 1-1000
cd $SGE_O_WORKDIR
module load krthornt/anaconda/3
SEED=`echo "$SGE_TASK_ID*$RANDOM"|bc -l`
SEED2=`echo "$SEED*$RANDOM"|bc -l`
SEED3=`echo "$SEED2*$SGE_TASK_ID*$RANDOM"|bc -l`
mspms 100 1 -t 1000 -r 1000 10000 | gzip > mspms.$SGE_TASK_ID.out.gz
Why would we want to do this?
Step 1: generate seeds
#!/bin/bash
for i in {1..1000}
do
echo $RANDOM $RANDOM $RANDOM
done > seeds
Now, we have a fixed set of seeds. Yay for reproducibility!
Step 2: figuring out parallel
The command is this:
parallel --colsep ' ' mspms 100 10 -t 5000 -r 5000 50000 --random-seeds {} :::: seeds
Oh boy–let's break that down
mspms 100 10 -t 5000 -r 5000 50000 --random-seeds {}
parallel --colsep ' '
Take the stuff from the input file (seeds), and break it into chunks separated by a space.
:::: seeds
Use the stuff in a file called seeds to:
It is simple:
#!/bin/bash
#$ -q bio
#$ -q pe openmp 64
cd $SGE_O_WORKDIR
parallel --colsep ' ' mspms 100 10 -t 5000 -r 5000 50000 --random-seeds {} :::: seeds
Rather than submit 1,000 jobs to the queue via an array job, we take over an entire nodee and use all 64 cores to crank through the jobs.
The two strategies can be mixed and matched:
#!/bin/bash
#$ -q bio
cd $SGE_O_WORKDIR
#untested :)
parallel [args for parallel] command :::: args.$SGE_TASK_D