why is my multicore code the same speed as single core with python multiprocessing

Question

I am testing python's multiprocessing module with the following code:

from multiprocessing import Pool
import math

def f(x):
    for i in range(1,1000):
        y = (x*x)/math.sqrt(i)
    return y

if __name__ == '__main__':
    p = Pool(7)
    print(p.map(f, range(20000)))

This gives me a time of around 39.8s with 7 cores. I verified all 7 processors were being worked on task manager.

The single core implementation:

print [f(x) for x in range(20000)]

takes 38s to complete. My understanding is that this is a very CPU intensive task so the 7-core version should be faster than the single core. Why am I not seeing any improvement in performance (infact seeing a decrease)

If you are using python 2 (which you might be judging from the `print` statement) then try using `xrange` instead of `range`. Can't say it would make that much difference, but every little helps. — cdarke, Apr 06 '17 at 08:42
Your `for` loop in function `f` looks a little strange. You appear to be looping for no good reason - you reassign `y` on each iteration and only use the final result. — cdarke, Apr 06 '17 at 08:45
are you on Windows? [This question](http://stackoverflow.com/questions/1289813/python-multiprocessing-vs-threading-for-cpu-bound-work-on-windows-and-linux?rq=1) discusses the cost associated with starting the pool — asongtoruin, Apr 06 '17 at 08:47
As @asongtoruin mentions, this may be a borderline issue of task cost versus Pool cost. On my machine (Linux), the difference in runtime is hard to quantify too (approx. 1-2 seconds with Pool, 2-3 without), but when using `range(200000)`, the Pool version is clearly faster. — Hans, Apr 06 '17 at 08:54
@cdarke I know I am re-assigning y upon each iteration, this is just to add extra computation to the function to increase the CPU time required. — Varun Balupuri, Apr 06 '17 at 08:57
@Hans, interesting. I will try this on a unix machine later. According to Byron Whitlock on the other thread: 'processes are much more lightweight under UNIX variants. Windows processes are heavy and take much more time to start up.' What kind of speed differences do you get on unix please? — Varun Balupuri, Apr 06 '17 at 09:05
@comradevaz: some compilers would optimise that out, check the byte-code that this has not happened here. — cdarke, Apr 06 '17 at 09:06
@comradevaz: When I add one zero to the main range loop (i.e., `range(200000)`, the Pool version takes approx. 10 seconds, and the single-thread version around 30 seconds. (I guess the multiprocessing overhead makes this runtime factor be only three.) Interestingly, this is with Python2 - my Python3 takes approx. 1.5 times longer for each version (15 secs multi/45 secs single thread). — Hans, Apr 06 '17 at 09:27
I have tried this on our linux machines with 40 processors - Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz - the speed result improvements are stunning. 0.63 seconds with 7 cores. Roughly 7 times faster than single core. I guess Windows is really clunky for multiprocessing — Varun Balupuri, Apr 07 '17 at 09:33

why is my multicore code the same speed as single core with python multiprocessing

0 Answers0