The issue is that your taskset
limits the CPUs that parallel
can run on to 4 CPUs. I'm fairly sure that that child processes of parallel
, each instance of my_bash_script.sh
and the Python processes it launches, will also inherit that same set of CPU affinities so they too will able to run on any of the 4 CPUs you specified.
What you want, I think, is to limit each Python process started by parallel
to a different CPU and you can do that by moving the taskset
into my_bash_script.sh
. Specifically, don't limit where parallel
runs but do limit where the Python processes it starts can run by wrapping the calls to python
in my_bash_script.sh
with taskset
. You're passing a number to each call to my_bash_script.sh
so you can use that to compute a different CPU for each python
call (currently you could use (($1 - 1))
since you're passing in values 1 - 4 but if you have more you'd want to take the value mod 4 or something).
Note that what I'm describing should work but it's imperfect. Suppose you ran process 1 on CPU 1, process 2 on CPU 2, etc. If process 1 finishes first then nothing is using CPU 1 and nothing will since you've limited the others to 1 CPU each. Ideally you'd like to have them be able to take over the now idle CPU but that's more complex.
Edit: @Ole Tang's answer above suggests using {%}
to use the jobslot instead of assigning tasks to CPUs based on the order you submit them which helps with (but does not eliminate) the problem I described above.
I am wondering why you want to limit them to 1 CPU each. It is true that if many are running at a time the kernel will try to time-slice them to give each thread equal resources and that can actually hurt performance due to context switching, cache conflicts, etc. OTOH, as noted above, the alternative is likely to have some CPUs idle for at least some of the time it's not obvious which will end up giving you better performance.