5

I'm using multiprocessing.Pool in Python on Ubuntu 12.04, and I'm running into a curious problem; When I call map_async on my Pool, I spawn 8 processes, but they all struggle for dominance over a single core of my 8-core machine. The exact same code uses up both of my cores in my Macbook Pro, and all four cores of my other Ubuntu 12.04 desktop (as measured with htop, in all cases).

My code is too long to post all of, but the important part is:

P = multiprocessing.Pool()
results = P.map_async( unwrap_self_calc_timepoint, zip([self]*self.xLen,xrange(self.xLen)) ).get(99999999999)
P.close()
P.join()
ipdb.set_trace()

where unwrap_self_calc_timepoint is a wrapper function to pass the necessary self argument to a class, based on the advice of this article.

All three computers are using Python 2.7.3, and I don't really know where to start in hunting down why that one Ubuntu computer is acting up. Any help as to how to begin narrowing the problem down would be helpful. Thank you!

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
staticfloat
  • 6,752
  • 4
  • 37
  • 51
  • what do you see in htop? – Ruggero Turra Sep 25 '12 at 23:05
  • It shows 8 python processes, each running at 12 percent, with 1 core out of 8 maxed out at 100% – staticfloat Sep 26 '12 at 14:18
  • I am having exactly the same problem. My code worked perfectly fine using all 8 available cores on Ubuntu 10.10, but only uses 1 core after apgrade to 12.04. Have you found any solution? – user1084871 Mar 02 '13 at 14:21
  • 2
    Not yet, I gave up on trying to do multiprocessing in Python, and just wrote the algorithm in [Julia](http://julialang.org) instead, getting the speed I needed. – staticfloat Mar 03 '13 at 00:29
  • possible duplicate of [Why does multiprocessing use only a single core after I import numpy?](http://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy) – WoJ Aug 07 '15 at 13:51

2 Answers2

2

I had the same problem, in my case the solution was to tell linux to work on the whole processors instead on only one : try adding the 2 following lines at the beginning of your code :

import os os.system("taskset -p 0xfffff %d" % os.getpid())

user2660966
  • 940
  • 8
  • 19
  • That means you had the affinity skewed somewhere else in the settings for your system or process. Anyway, the advice to check the affinity is a good one. – ivan_pozdeev Nov 10 '14 at 21:48
2

This seems to be a fairly common issue between numpy and certain Linux distributions. I haven't had any luck using taskset near the start of the program, but it does do the trick when used in the code to be parallelized:

import multiprocessing as mp
import numpy as np
import os

def something():
    os.system("taskset -p 0xfffff %d" % os.getpid())
    X = np.random.randn(5000,2000)
    Y = np.random.randn(2000,5000)
    Z = np.dot(X,Y)
    return Z.mean()

pool = mp.Pool(processes=10)
out = pool.map(something, np.arange(20))
pool.close()
pool.join()
Russ
  • 3,644
  • 1
  • 13
  • 15