0

I am testing python's multiprocessing module with the following code:

from multiprocessing import Pool
import math

def f(x):
    for i in range(1,1000):
        y = (x*x)/math.sqrt(i)
    return y

if __name__ == '__main__':
    p = Pool(7)
    print(p.map(f, range(20000)))

This gives me a time of around 39.8s with 7 cores. I verified all 7 processors were being worked on task manager.

The single core implementation:

print [f(x) for x in range(20000)]

takes 38s to complete. My understanding is that this is a very CPU intensive task so the 7-core version should be faster than the single core. Why am I not seeing any improvement in performance (infact seeing a decrease)

Varun Balupuri
  • 363
  • 5
  • 17
  • If you are using python 2 (which you might be judging from the `print` statement) then try using `xrange` instead of `range`. Can't say it would make that much difference, but every little helps. – cdarke Apr 06 '17 at 08:42
  • Your `for` loop in function `f` looks a little strange. You appear to be looping for no good reason - you reassign `y` on each iteration and only use the final result. – cdarke Apr 06 '17 at 08:45
  • 1
    are you on Windows? [This question](http://stackoverflow.com/questions/1289813/python-multiprocessing-vs-threading-for-cpu-bound-work-on-windows-and-linux?rq=1) discusses the cost associated with starting the pool – asongtoruin Apr 06 '17 at 08:47
  • As @asongtoruin mentions, this may be a borderline issue of task cost versus Pool cost. On my machine (Linux), the difference in runtime is hard to quantify too (approx. 1-2 seconds with Pool, 2-3 without), but when using `range(200000)`, the Pool version is clearly faster. – Hans Apr 06 '17 at 08:54
  • @cdarke I know I am re-assigning y upon each iteration, this is just to add extra computation to the function to increase the CPU time required. – Varun Balupuri Apr 06 '17 at 08:57
  • @Hans, interesting. I will try this on a unix machine later. According to Byron Whitlock on the other thread: 'processes are much more lightweight under UNIX variants. Windows processes are heavy and take much more time to start up.' What kind of speed differences do you get on unix please? – Varun Balupuri Apr 06 '17 at 09:05
  • @comradevaz: some compilers would optimise that out, check the byte-code that this has not happened here. – cdarke Apr 06 '17 at 09:06
  • @comradevaz: When I add one zero to the main range loop (i.e., `range(200000)`, the Pool version takes approx. 10 seconds, and the single-thread version around 30 seconds. (I guess the multiprocessing overhead makes this runtime factor be only three.) Interestingly, this is with Python2 - my Python3 takes approx. 1.5 times longer for each version (15 secs multi/45 secs single thread). – Hans Apr 06 '17 at 09:27
  • I have tried this on our linux machines with 40 processors - Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz - the speed result improvements are stunning. 0.63 seconds with 7 cores. Roughly 7 times faster than single core. I guess Windows is really clunky for multiprocessing – Varun Balupuri Apr 07 '17 at 09:33

0 Answers0