In the following code, I time how long it takes to pass a large array (8 MB) to a child process using the args
key word when forking the process verses passing using a pipe.
Does anyone have any insight into why it is so much faster to pass data using an argument than using a pipe?
Below, each code block is a cell in a Jupyter notebook.
import multiprocessing as mp
import random
N = 2**20
x = list(map(lambda x : random.random(),range(N)))
Time the call to sum
in the parent process (for comparison only):
%%timeit -n 5 -r 10 -p 8 -o -q
pass
y = sum(x)/N
t_sum = _
Time the result of calling sum
from a child process, using the args
keyword to pass list x
to child process.
def mean(x,q):
q.put(sum(x))
%%timeit -n 5 -r 10 -p 8 -o -q
pass
q = mp.Queue()
p = mp.Process(target=mean,args=(x,q))
p.start()
p.join()
s = q.get()
m = s/N
t_mean = _
Time using a pipe to pass data to child process
def mean_pipe(cp,q):
x = cp.recv()
q.put(sum(x))
%%timeit -n 5 -r 10 -p 8 -o -q
pass
q = mp.Queue()
pipe0,pipe1 = mp.Pipe()
p = mp.Process(target=mean_pipe,args=[pipe0,q])
p.start()
pipe1.send(x)
p.join()
s = q.get()
m = s/N
t_mean_pipe = _
(ADDED in response to comment) Use mp.Array
shared memory feature (very slow!)
def mean_pipe_shared(xs,q):
q.put(sum(xs))
%%timeit -n 5 -r 10 -p 8 -o -q
xs = mp.Array('d',x)
q = mp.Queue()
p = mp.Process(target=mean_pipe_shared,args=[xs,q])
p.start()
p.join()
s = q.get()
m = s/N
t_mean_shared = _
Print out results (ms)
print("{:>20s} {:12.4f}".format("MB",8*N/1024**2))
print("{:>20s} {:12.4f}".format("mean (main)",1000*t_sum.best))
print("{:>20s} {:12.4f}".format("mean (args)",1000*t_mean.best))
print("{:>20s} {:12.4f}".format("mean (pipe)",1000*t_mean_pipe.best))
print("{:>20s} {:12.4f}".format("mean (shared)",1000*t_mean_shared.best))
MB 8.0000
mean (main) 7.1931
mean (args) 38.5217
mean (pipe) 136.5020
mean (shared) 4195.0568
Using the pipe is over 3 times slower than passing arguments to the child process. And unless I am doing something very wrong, mp.Array
is a non-starter.
Why is the pipe so much slower than passing directly to the subprocess (using args
)? And what's up with the shared memory?