1

In the following code, I time how long it takes to pass a large array (8 MB) to a child process using the args key word when forking the process verses passing using a pipe.

Does anyone have any insight into why it is so much faster to pass data using an argument than using a pipe?

Below, each code block is a cell in a Jupyter notebook.

import multiprocessing as mp
import random

N = 2**20
x = list(map(lambda x : random.random(),range(N)))

Time the call to sum in the parent process (for comparison only):


%%timeit -n 5 -r 10 -p 8 -o -q
pass

y = sum(x)/N

t_sum = _

Time the result of calling sum from a child process, using the args keyword to pass list x to child process.


def mean(x,q):
    q.put(sum(x))

%%timeit -n 5 -r 10 -p 8 -o -q
pass

q = mp.Queue()
p = mp.Process(target=mean,args=(x,q))
p.start()
p.join()
s = q.get()
m = s/N

t_mean = _

Time using a pipe to pass data to child process


def mean_pipe(cp,q):
    x = cp.recv()
    q.put(sum(x))

%%timeit -n 5 -r 10 -p 8 -o -q
pass

q = mp.Queue()
pipe0,pipe1 = mp.Pipe()
p = mp.Process(target=mean_pipe,args=[pipe0,q])
p.start()
pipe1.send(x)
p.join()
s = q.get()
m = s/N

t_mean_pipe = _

(ADDED in response to comment) Use mp.Array shared memory feature (very slow!)


def mean_pipe_shared(xs,q):
    q.put(sum(xs))

%%timeit -n 5 -r 10 -p 8 -o -q
xs = mp.Array('d',x)

q = mp.Queue()
p = mp.Process(target=mean_pipe_shared,args=[xs,q])
p.start()
p.join()
s = q.get()
m = s/N

t_mean_shared = _

Print out results (ms)

print("{:>20s} {:12.4f}".format("MB",8*N/1024**2))
print("{:>20s} {:12.4f}".format("mean (main)",1000*t_sum.best))
print("{:>20s} {:12.4f}".format("mean (args)",1000*t_mean.best))
print("{:>20s} {:12.4f}".format("mean (pipe)",1000*t_mean_pipe.best))
print("{:>20s} {:12.4f}".format("mean (shared)",1000*t_mean_shared.best))         

              MB       8.0000
     mean (main)       7.1931
     mean (args)      38.5217
     mean (pipe)     136.5020
   mean (shared)    4195.0568

Using the pipe is over 3 times slower than passing arguments to the child process. And unless I am doing something very wrong, mp.Array is a non-starter.

Why is the pipe so much slower than passing directly to the subprocess (using args)? And what's up with the shared memory?

Donna
  • 1,390
  • 1
  • 14
  • 30
  • 1
    I *suspect* that using a pipe forces data to be copied, while a read-only list of arguments can be shared between processes by the kernel, and so no copying needs to take place (until the other process starts modifying the arguments, which then triggers a copy-on-write). Which OS is this? Can you repeat the experiment on other platforms? – tripleee Jan 25 '18 at 07:58
  • @tripleee I am running on OSX, but see the same behavior on a Linux (Red Hat). If it were only sharing the data when the argument is passed, why does it take any time at all? Passing the argument is still about 5 times slower than calling from the parent process. Overhead with setting up the process is only about 10ms. – Donna Jan 25 '18 at 08:22
  • 1
    Let me modify my previous guess. You are explicitly copying things in both cases, but when you are using the pipe, the kernel probably ends up copying it behind the scenes multiple times. See also https://stackoverflow.com/a/11710888/874188 – tripleee Jan 25 '18 at 08:30
  • Also, this: https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer – tripleee Jan 25 '18 at 08:32
  • @tripleee Thanks for those links. Is the "pipe" communication times closer to what one would expect with true interprocessor communication? i.e. message-passing? – Donna Jan 25 '18 at 13:39
  • 1
    Shared memory IPC should be faster in this scenario, and is "true" as far as I can tell. If you need things to be distributed you are obviously looking for mechanisms with higher latency at the expense of the convenience offered by local direct-memory access, so it really depends on what sort of scalability you are looking for. In a way, local pipes seems to be the worst of both worlds ... but the flexibility could be worth it if you *might* want to choose between a local and a remote socket. – tripleee Jan 25 '18 at 13:42
  • It just seems that passing using `args` is a one-off; it can be done only once, when the processed is forked, and in only one direction. And some special (read : mysterious?) mechanism is invoked under the hood (hence my original question). The pipe, on the other hand, offers more flexibility (bi-directional; can be used as long as the two processes are alive) and so it is clear what is happening. But the price is evidently higher latency. – Donna Jan 25 '18 at 13:59
  • Look at shared memory again! – tripleee Jan 25 '18 at 14:00
  • Do you mean using `mp.Array`? I tried that but didn't post the results because it is extraordinarily slow. (Why??) – Donna Jan 25 '18 at 14:20
  • Hmm, no direct experience, but I was theoretizing that it should be faster than a pipe. Oh well. https://stackoverflow.com/a/14145983/874188 vaguely suggests a possible workaround. – tripleee Jan 25 '18 at 14:22

0 Answers0