Why is pool.map slower than normal map?

Question

I'm trying the following code:

import multiprocessing
import time
import random

def square(x):
    return x**2

pool = multiprocessing.Pool(4)

l = [random.random() for i in xrange(10**8)]

now = time.time()
pool.map(square, l)
print time.time() - now

now = time.time()
map(square, l)
print time.time() - now

and the pool.map version consistently runs several seconds more slowly than the normal map version (19 seconds vs 14 seconds).

I've looked at the questions: Why is multiprocessing.Pool.map slower than builtin map? and multiprocessing.Pool() slower than just using ordinary functions and they seem to chalk it up to to either IPC overhead or disk saturation, but I feel like in my example those aren't obviously the issue; I'm not writing/reading anything to/from disk, and the computation is long enough that it seems like IPC overhead should be small compared to the total time saved by the multiprocessing (I'm estimating that, since I'm doing work on 4 cores instead of 1, I should cut the computation time down from 14 seconds to about 3.5 seconds). I'm not saturating my cpu I don't think; checking cat /proc/cpuinfo shows that I have 4 cores, but even when I multiprocess to only 2 processes it's still slower than just the normal map function (and even slower than 4 processes). What else could be slowing down the multiprocessed version? Am I misunderstanding how IPC overhead scales?

If it's relevant, this code is written in Python 2.7, and my OS is Linux Mint 17.2

@PeterWood: You can see from the code that it has `10**8` elements. — BrenBarn, Mar 07 '16 at 18:52
Shouldn't it be wrapped in a main function, or is that just on Windows? — Peter Wood, Mar 07 '16 at 18:54
@George Which OS are you on? In Windows, there is no `fork()` (exclusions may apply) so you typically need to call freeze_support() wrapped in a main function. — Goodies, Mar 07 '16 at 19:26
I'm on Linux Mint 17.2, updated the question to reflect this — George, Mar 07 '16 at 19:27
I'm surprised you lost so *little* time. This isn't some efficient shared-memory system where each worker just needs to be told what it's doing and where the input is. IIUC, each float in that list has to be pickled and sent down a pipe to a worker process when you do `pool.map`. Python is *really* bad at parallelism. — user2357112, Mar 07 '16 at 19:39
I'm getting consistently faster results with pool.map(). I had to use `range(10**7 * 3)` because of my current virtual memory settings, but here's the speeds: `Generated in: 6.862e+00`, `Multiprocessing Map: 1.112e+01` , `Standard Map: 1.666e+01` — Goodies, Mar 07 '16 at 19:40
I was going to say that the pickling comment cleared things up for me, but @Goodies's comment has made things mysterious again — George, Mar 07 '16 at 19:41

score 1 · Accepted Answer · answered Mar 08 '16 at 08:12

pool.map splits a list into N jobs (where N is the size of the list) and dispatches those to the processes.

The work a single process is doing is shown in your code:

def square(x):
    return x**2

This operation takes very little time on modern CPUs, no matter how big the number is.

In your example you're creating a huge list and performing an irrelevant operation on every single element. Of course the IPC overhead will be greater compared to the regular map function which is optimized for fast looping.

In order to see your example working as you expect, just add a time.sleep(0.1) call to the square function. This simulates a long running task. Of course you might want to reduce the size of the list or it will take forever to complete.

Why is pool.map slower than normal map?

1 Answers1

Linked