I'm trying to parallelize some calculations that use numpy
with the help of Python's multiprocessing
module. Consider this simplified example:
import time
import numpy
from multiprocessing import Pool
def test_func(i):
a = numpy.random.normal(size=1000000)
b = numpy.random.normal(size=1000000)
for i in range(2000):
a = a + b
b = a - b
a = a - b
return 1
t1 = time.time()
test_func(0)
single_time = time.time() - t1
print("Single time:", single_time)
n_par = 4
pool = Pool()
t1 = time.time()
results_async = [
pool.apply_async(test_func, [i])
for i in range(n_par)]
results = [r.get() for r in results_async]
multicore_time = time.time() - t1
print("Multicore time:", multicore_time)
print("Efficiency:", single_time / multicore_time)
When I execute it, the multicore_time
is roughly equal to single_time * n_par
, while I would expect it to be close to single_time
. Indeed, if I replace numpy
calculations with just time.sleep(10)
, this is what I get — perfect efficiency. But for some reason it does not work with numpy
. Can this be solved, or is it some internal limitation of numpy
?
Some additional info which may be useful:
I'm using OSX 10.9.5, Python 3.4.2 and the CPU is Core i7 with (as reported by the system info) 4 cores (although the above program only takes 50% of CPU time in total, so the system info may not be taking into account hyperthreading).
when I run this I see
n_par
processes intop
working at 100% CPUif I replace
numpy
array operations with a loop and per-index operations, the efficiency rises significantly (to about 75% forn_par = 4
).