My question is similar to this question
Long story short, I want to use all available CPU cores, over as many nodes as possible.
The difference is that instead of a single job that's an MPI program, my job consists of N independent tasks, of 1 core per task. N could potentially be greater than the total number of available cores, in which case some tasks would just need to wait.
For example, say I have a cluster of 32 cores. And say I'd like to run the same program (worker_script.sh
), 100 times, each with different input. Each call to worker_script.sh
is a task. I would like the first 32 tasks to run, while the remaining 68 tasks would be queued. When cores free up, the later tasks would run. Eventually, my job is considered finished when all tasks are done running.
What is the proper way to do that? I did the following script, and I invoked it with sbatch
. But it just runs everything on the same core. So it ended up taking forever.
#!/bin/bash
ctr=0
while [[ $ctr -lt 100 ]]; do
srun worker_script.sh $ctr &
((ctr++))
done
wait
Alternatively, I could invoke the above script directly. That seemed to do the trick. As in, it took over all 32 cores, while queued up everything else. When cores freed up, they would then get allocated to the remaining calls to worker_script.sh
. Eventually, all 100 jobs finished, all out of order of course, as expected.
The difference is that instead of 1 job of 100 tasks, it was 100 jobs of 1 task each.
Is there a reason I can't do 100 independent tasks? Am I fundamentally wrong to begin with? Should I be doing 100 jobs instead of 100 tasks?