What is being pickled when I call multiprocessing.Process?

Question

I know that multiprocessing uses pickling in order to have the processes run on different CPUs, but I think I am a little confused as to what is being pickled. Lets look at this code.

from multiprocessing import Process

def f(I):
    print('hello world!',I)

if __name__ == '__main__':
    for I in (range1, 3):
        Process(target=f,args=(I,)).start()

I assume what is being pickled is the def f(I) and the argument going in. First, is this assumption correct?

Second, lets say f(I) has a function call within in it like:

def f(I):
    print('hello world!',I)
    randomfunction()

Does the randomfunction's definition get pickled as well, or is it only the function call?

Further more, if that function call was located in another file, would the process be able to call it?

dano · Accepted Answer · 2014-09-24T20:55:35.377

In this particular example, what gets pickled is platform dependent. On systems that support os.fork, like Linux, nothing is pickled here. Both the target function and the args you're passing get inherited by the child process via fork.

On platforms that don't support fork, like Windows, the f function and args tuple will both be pickled and sent to the child process. The child process will re-import your __main__ module, and then unpickle the function and its arguments.

In either case, randomfunction is not actually pickled. When you pickle f, all you're really pickling is a pointer for the child function to re-build the f function object. This is usually little more than a string that tells the child how to re-import f:

>>> def f(I):
...     print('hello world!',I)
...     randomfunction()
... 
>>> pickle.dumps(f)
'c__main__\nf\np0\n.'

The child process will just re-import f, and then call it. randomfunction will be accessible as long as it was properly imported into the original script to begin with.

Note that in Python 3.4+, you can get the Windows-style behavior on Linux by using contexts:

ctx = multiprocessing.get_context('spawn')
ctx.Process(target=f,args=(I,)).start()  # even on Linux, this will use pickle

The descriptions of the contexts are also probably relevant here, since they apply to Python 2.x as well:

spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

forkserver

When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.

Available on Unix platforms which support passing file descriptors over Unix pipes.

Note that forkserver is only available in Python 3.4, there's no way to get that behavior on 2.x, regardless of the platform you're on.

Great answer, but one note: In many cases where you think you need `spawn` (generally because you're using a library that expects to be able to talk to a specific event loop in the main thread), `forkserver` is actually better. — abarnert, Sep 24 '14 at 20:52
@abarnert Yeah, I had skipped `forkserver` altogether since it's 3.4+ only. I've edited it in, since its relevant (and useful, as you pointed out) for any folks on Unix using 3.4+. — dano, Sep 24 '14 at 20:57
Doesn't `spawn` also require 3.4? IIRC, 3.3 had an undocumented OS X-specific hack for doing something similar, but that's not likely to help anyone except people with the specific bug that was a workaround for. — abarnert, Sep 24 '14 at 21:35
One more thing to note that probably won't affect the OP or any future askers, but just in case: `multiprocessing` extends `pickle` in a few ways (see [`multiprocessing/reduction.py`](https://hg.python.org/cpython/file/default/Lib/multiprocessing/reduction.py) for details) to add support for things like (IIRC) raw files, bound and unbound methods, `functools.partial`s, and `operator.itemgetter`s and `attrgetter`s, so the documentation on what can be pickled (and the output of `pickletools`) can be misleading. — abarnert, Sep 24 '14 at 21:40

score 3 · Answer 2 · answered Sep 24 '14 at 20:39

The function is pickled, but possibly not in the way you think of it:

You can look at what's actually in a pickle like this:

pickletools.dis(pickle.dumps(f))

I get:

 0: c    GLOBAL     '__main__ f'
12: p    PUT        0
15: .    STOP

You'll note that there is nothing in there correspond to the code of the function. Instead, it has references to __main__ f which is the module and name of the function. So when this is unpickled, it will always attempt to lookup the f function in the __main__ module and use that. When you use the multiprocessing module, that ends up being a copy of the same function as it was in your original program.

This does mean that if you somehow modify which function is located at __main__.f you'll end up unpickling a different function then you pickled in.

Multiprocessing brings up a complete copy of your program complete with all the functions you defined it. So you can just call functions. The entire function isn't copied over, just the name of the function. The pickle module's assumption is that function will be same in both copies of your program, so it can just lookup the function by name.

score -1 · Answer 3 · answered Sep 24 '14 at 20:42

-1

Only the function arguments (I,) and the return value of the function f are pickled. The actual definition of the function f has to be available when loading the module.

The easiest way to see this is through the code:

from multiprocessing import Process

if __name__ == '__main__':
    def f(I):
        print('hello world!',I)

    for I in [1,2,3]:
        Process(target=f,args=(I,)).start()

That returns:

AttributeError: 'module' object has no attribute 'f'

answered Sep 24 '14 at 20:42

gdanezis

619
4
7

The return value of `f` is not sent back the parent process, so its never pickled. Also, `f` *is* pickled and sent to the child. Unpickling `f` requires being able to import it from the top-level of the module, though, which is why you see that error. All of this also only applies if using Windows (unless you're using Python 3.x and the `'spawn'` context on Posix). – dano Sep 24 '14 at 20:44
Aha -- I was thinking of the multipricessing `map` function. That surely has to pickle the return value? – gdanezis Sep 24 '14 at 20:48
Yep, on all platforms, all calls to `multiprocessing.Pool` instance methods (like `map`, and `apply`) will pickle the arguments and return values; `fork` can't help you once the child process has already been started. – dano Sep 24 '14 at 20:49

What is being pickled when I call multiprocessing.Process?

3 Answers3

Linked