4

I will preface this by saying I'm new to parallel processing. Working on getting better, but I can't find an answer to my problem, which seems to be fairly unique.

I am having trouble with this piece of code:

from joblib import Parallel
import multiprocessing

n_cores = multiprocessing.cpu_count()
Parallel(n_jobs=n_cores)(delayed(blexon)(gene,genomes) for gene in genes)

'genes' and 'genomes' are lists of strings. In my genes list, I can have hundreds of genes. I'm using Parallel to run this process on all these genes, and this works! If you consider the genes running at the same time as sets, then for the first few 'sets' I can use all cores of my computer. After a few sets, though, the program only uses one core. If I have 12 cores, 12 genes will run simultaneously for a few set iterations, but at some point, only one gene will run.

I have found information about Parallel using only one core with scipy (which I use in this script), but this is strange behavior: it uses all the cores temporarily before switching to using a single core.

I'm not sure how to fix this problem. Does anyone have any input?

Thank you.

System: Ubuntu 16.04 LTS Python 3.5.2 joblib 0.9.4

---EDIT---

Here is the bit of code I used to try to address the cpu affinity problem:

p = psutil.Process(os.getpid())
print(p.cpu_affinity())
p.cpu_affinity(range(multiprocessing.cpu_count()))
print(p.cpu_affinity())

Output: (This computer has 12 cores)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

--EDIT--

After watching htop while the program is running (each iteration can take a few minutes) I find that the number of cores decreases gradually. I started with 12 cores, and after this set, I'm down to 11 cores running in parallel.

  • Is this a problem of setting cpu affinity? I use psutil for that. – hamster on wheels Feb 22 '17 at 20:00
  • The most obvious fixes for that (http://stackoverflow.com/questions/16323743/ipython-parallel-not-using-multicore) do not work. I'll edit my question to add the code I tried to address that. – Chris Leonard Feb 22 '17 at 23:16
  • i was using `p.cpu_affinity(cpu_id)` where `cpu_id` is a single integer instead of range. This binds the process to the single cpu with the `cpu_id`. I also used different cpu_id for different subprocesses. I think the code you used would allow the process to run on all/any of the 12 cpu specified by that range instead of forcing the process to run only on a specific cpu. See the example for `cpu_affinity` at https://pythonhosted.org/psutil/ – hamster on wheels Feb 23 '17 at 02:35

0 Answers0