Slurm - How to use all available CPUs for independent tasks?

Question

My question is similar to this question

Long story short, I want to use all available CPU cores, over as many nodes as possible.

The difference is that instead of a single job that's an MPI program, my job consists of N independent tasks, of 1 core per task. N could potentially be greater than the total number of available cores, in which case some tasks would just need to wait.

For example, say I have a cluster of 32 cores. And say I'd like to run the same program (worker_script.sh), 100 times, each with different input. Each call to worker_script.sh is a task. I would like the first 32 tasks to run, while the remaining 68 tasks would be queued. When cores free up, the later tasks would run. Eventually, my job is considered finished when all tasks are done running.

What is the proper way to do that? I did the following script, and I invoked it with sbatch. But it just runs everything on the same core. So it ended up taking forever.

#!/bin/bash
ctr=0
while [[ $ctr -lt 100 ]]; do 
   srun worker_script.sh $ctr &
   ((ctr++))
done

wait

Alternatively, I could invoke the above script directly. That seemed to do the trick. As in, it took over all 32 cores, while queued up everything else. When cores freed up, they would then get allocated to the remaining calls to worker_script.sh. Eventually, all 100 jobs finished, all out of order of course, as expected.

The difference is that instead of 1 job of 100 tasks, it was 100 jobs of 1 task each.

Is there a reason I can't do 100 independent tasks? Am I fundamentally wrong to begin with? Should I be doing 100 jobs instead of 100 tasks?

Did you add any sbatch parameters when submitting the script? `-n` or anything like it? — Marcus Boden, Aug 24 '20 at 09:48
I did not. Should I? I'd like it to use up all available resources as possible. — user3240688, Aug 24 '20 at 22:13
Yes, by default, you will only get a single task. If you call srun inside a script submitted via sbatch, it will be limited to the number of tasks allocated via sbatch. This means that srun has only a single task to work with, which is why they run sequentially. — Marcus Boden, Aug 25 '20 at 11:12

score 2 · Answer 1 · answered Aug 25 '20 at 11:33

If you submit that script via sbatch, it will allocate a single task to the job. And inside of the job, the srun command is limited to the ressources of the job. This is why your calculations run sequentially, when you submit it via sbatch.

If you just run the script, without sbatch, the call to srun will create a new job everytime (as you already noticed) and therefore it is not limited to a single task.

Is there a reason I can't do 100 independent tasks? Am I fundamentally wrong to begin with? Should I be doing 100 jobs instead of 100 tasks?

In the end, it is a bit of personal preference which way you prefer. You can have a single job with 100 tasks:

#!/bin/bash
#SBATCH -n 32
ctr=0
while [[ $ctr -lt 100 ]]; do 
   srun -n 1 worker_script.sh $ctr &
   ((ctr++))
done

wait

This will allocate 32 tasks and each srun call will consume 1 task, the rest should be. Disadvantage: You will need to wait for 32 tasks to be free at once. Meaning that you likely wait longer in the queue.

A better way (in my opinion) is to use a job array:

#!/bin/bash
#SBATCH -a 0-99%32
worker_script.sh $SLURM_ARRAY_TASK_ID

This creates a job array with 100 jobs. 32 of them can run simultaneously. If you don't need/want the latter, you can just remove the %32 part from the #SBATCH parameter. Why is this better? If your tasks are completely independent, there's no real need to have them all in one job. And this way, a task can run as soon as there is a slot free anywhere. This should reduce the time in queue to a minimum.

Additionally, using job arrays is elegant and puts less load on the scheduler. Your admins will likely prefer having a large job array over numerous identical jobs submitted in a for-loop.

score 0 · Answer 2 · answered Aug 24 '20 at 10:57

0

Take a look at sbatch instead of srun, see here for docs.

#!/bin/bash
ctr=0
while [[ $ctr -lt 100 ]]; do 
   sbatch worker_script.sh $ctr -n 1 & ((ctr++))
done

srun is so-called interactive/blocking, but sbatch submits the job to the cluster and outputs the stdout/stderr to a file.

answered Aug 24 '20 at 10:57

Maarten-vd-Sande

3,413
10
27

1

calling sbatch 100 times would create 100 jobs of 1 task each. I would prefer to create 1 job of 100 task, because conceptually they are 1 unit. Is 1 job of 100 task not doable? – user3240688 Aug 24 '20 at 22:15

Slurm - How to use all available CPUs for independent tasks?

2 Answers2

Linked