How do I limit using one CPU per python processes launched via gnu parallel?

Question

If I run this script

$ seq 1 4 | taskset -c 0-3 parallel -j4 -u <my_bash_script.sh>

Then each python process contained in the <my_bash_script.sh> runs on multiple cpus instead of one. The python function use both numpy and pytorch. So the option taskset -c 0-4 impose the max number of CPUs but it doesn't guarantee that each process will be limited to one CPU.

I've tried

$ export OPENBLAS_NUM_THREADS=1
$ export MKL_NUM_THREADS=1

but it didn't work

I've also added to the python script

import mkl
mkl.set_num_threads(1)

but it didn't help

It would help if you actually showed us the Python script... *Generally*, Python will not use more than one core unless you use `multiprocessing` or `concurrent.futures`. — Roland Smith, Aug 23 '20 at 21:25
@RolandSmith he indicates he's using numpy and pytorch both of which are C extensions and they do run on multiple CPUs. — Oliver Dain, Aug 23 '20 at 22:50
@OliverDain At least for numpy it is more correct to say that it *may* use multiple cores. It really depends on how numpy was built. Especially with which BLAS library it was built. And even then only those operations that use BLAS use multiple cores. — Roland Smith, Aug 24 '20 at 06:24

score 1 · Answer 1 · answered Aug 24 '20 at 06:10

Use jobslot:

$ seq 1 4 | parallel -j4 -u taskset -c {%} <my_bash_script.sh>

Jobslot is built for this: Imagine you have a lot more than 4 jobs. If you then give every 4th job to cpu 4, then you risk that every 4th job is shorter than the others. In which case cpu 4 will be idling even if there are more jobs to be run.

Jobslot does not pass every 4th job to cpu 4. Instead it looks a which cpu (or rather jobslot) that finished a job, and then starts a new job on that cpu.

(Also: Since you are using -u you should learn the difference between --group (default) and --linebuffer (which is often what you really want when using -u)).

Oliver Dain · Answer 2 · 2020-08-24T22:30:27.550

The issue is that your taskset limits the CPUs that parallel can run on to 4 CPUs. I'm fairly sure that that child processes of parallel, each instance of my_bash_script.sh and the Python processes it launches, will also inherit that same set of CPU affinities so they too will able to run on any of the 4 CPUs you specified.

What you want, I think, is to limit each Python process started by parallel to a different CPU and you can do that by moving the taskset into my_bash_script.sh. Specifically, don't limit where parallel runs but do limit where the Python processes it starts can run by wrapping the calls to python in my_bash_script.sh with taskset. You're passing a number to each call to my_bash_script.sh so you can use that to compute a different CPU for each python call (currently you could use (($1 - 1)) since you're passing in values 1 - 4 but if you have more you'd want to take the value mod 4 or something).

Note that what I'm describing should work but it's imperfect. Suppose you ran process 1 on CPU 1, process 2 on CPU 2, etc. If process 1 finishes first then nothing is using CPU 1 and nothing will since you've limited the others to 1 CPU each. Ideally you'd like to have them be able to take over the now idle CPU but that's more complex.

Edit: @Ole Tang's answer above suggests using {%} to use the jobslot instead of assigning tasks to CPUs based on the order you submit them which helps with (but does not eliminate) the problem I described above.

I am wondering why you want to limit them to 1 CPU each. It is true that if many are running at a time the kernel will try to time-slice them to give each thread equal resources and that can actually hurt performance due to context switching, cache conflicts, etc. OTOH, as noted above, the alternative is likely to have some CPUs idle for at least some of the time it's not obvious which will end up giving you better performance.

How do I limit using one CPU per python processes launched via gnu parallel?

2 Answers2