Replace pickle in Python multiprocessing lib

Question

I need to execute the code below (simplified version of my real code base in Python 3.5):

import multiprocessing
def forever(do_something=None):
    while True:
        do_something()

p = multiprocessing.Process(target=forever, args=(lambda: print("do  something"),))
p.start()

In order to create the new process Python need to pickle the function and the lambda passed as target. Unofrtunately pickle cannot serialize lambdas and the output is like this:

_pickle.PicklingError: Can't pickle <function <lambda> at 0x00C0D4B0>: attribute lookup <lambda> on __main__ failed

I discoverd cloudpickle which can serialize and deserialize lambdas and closures, using the same interface of pickle.

How can I force the Python multiprocessing module to use cloudpickle instead of pickle?

Clearly hacking the code of the standard lib multiprocessing is not an option!

Thanks

Charlie

Why is it necessary? It doesn't *look* necessary in the code you've posted. — juanpa.arrivillaga, Oct 25 '16 at 08:20
As i mentioned the code posted is a simplified version of my codebase. Changing the codebase is not an option. Please stay focused on the question ;-) — Charlie, Oct 25 '16 at 08:33
Maybe check source code of `cloudpickle` and see how it's done? Then find a way to wrap lambdas? — Kh40tiK, Oct 25 '16 at 15:59
note that instances of callable classes can be pickled. In most situations replacing your lambda with a custom callable should be a much cleaner solution than changing the serializer of the multiprocessing-module. — julaine, Mar 13 '23 at 08:07

Mike McKerns · Accepted Answer · 2016-10-25T16:46:08.900

15

Try multiprocess. It's a fork of multiprocessing that uses the dill serializer instead of pickle -- there are no other changes in the fork.

I'm the author. I encountered the same problem as you several years ago, and ultimately I decided that that hacking the standard library was my only choice, as some of the pickle code in multiprocessing is in C++.

>>> import multiprocess as mp
>>> p = mp.Pool()
>>> p.map(lambda x:x**2, range(4))
[0, 1, 4, 9]
>>>

edited Oct 25 '16 at 16:46

answered Oct 25 '16 at 16:40

Mike McKerns

33,715
8
119
139

Mike, thanks for your suggestion. Indeed your multiprocess+dill was my first attempt to tackle the problem. However my code was working ok with Python mutliprocessing (even if without lambdas!), but did not fully work correclty with multiprocess and dill. I think this is surely my fault due to how i manage multi-processing in my codebase. I think the only way to go is use your multiprocess+dill and try to understand the problem in my code and eventually fix it! Thank you – Charlie Oct 27 '16 at 13:46
If you do find an issue with either module, then please do submit a bug report. – Mike McKerns Oct 27 '16 at 13:49
Mike, don' t worry, I will definitely submit a bug report if I find any issue. Your library is extermely valuable for Python and I am happy to help! – Charlie Oct 27 '16 at 14:03
What have you done exactly in C++? You can just replace the pickle import for dill and build the module: see https://github.com/Rocamonde/multiprocessing_on_dill – Sep 05 '18 at 20:56
Are there any flaws in doing such replacement? – Sep 05 '18 at 20:56
1

@J.C.Rocamonde: I'm not sure what your point/question is. The link you have given is essentially a dead project (3 yrs stale) that was not aware of the implementation in `multiprocess`. It's essentially a duplication of effort. Basically, yes, it's replacing `pickle` with `dill` everywhere. `multiprocess` also does the relevant replacements at the C++ layer. All modifications are tracked per python version in files like this one: https://github.com/uqfoundation/multiprocess/blob/master/py2.7/README_MODS – Mike McKerns Sep 06 '18 at 04:16
I updated it yesterday to Python 3.7. its a fork of the original. I'm asking if replacing the module import is enough or you have to sustitute C++ code as well. Im saying because I've tested it and for now no error has occurred. Will the library be buggy without those repalcements? – Sep 06 '18 at 08:22
Is your module up-to-daate with Python 3.7 by the way? – Sep 06 '18 at 08:22
2

I saw what changes you made. Yes, `multiprocess` is up to date with all versions of python, including 3.7. And yes, you need to substitute in the C layer or certain circumstances `pickle` will be used when `dill` is requested. My point is, there's no features in the library you forked that isn't in `multiprocess`, and `multiprocess` is better supported. I welcome you to contribute to `multiprocess`, or at least, to fork it. – Mike McKerns Sep 06 '18 at 13:57

Andy Jones · Answer 2 · 2021-09-14T13:01:40.127

If you're willing to do a little monkeypatching, a quick fix is to sub out the pickle.Pickler:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

or, in more recent versions of Python where _pickle.Pickle is pulled in,

from multiprocessing import reduction
import cloudpickle
reduction.ForkingPickler = cloudpickle.Pickler

Just make sure to do this before importing multiprocessing. Here's a full example:

import pickle
import cloudpickle
pickle.Pickler = cloudpickle.Pickler

import multiprocessing as mp
mp.set_start_method('spawn', True)

def procprint(f):
    print(f())

if __name__ == '__main__':
    p = mp.Process(target=procprint, args=(lambda: "hello",))
    p.start()
    p.join()

As an aside, you won't need to do any of this if your start method is fork, since with forking nothing needs to be pickled in the first place.

score 1 · Answer 3 · answered Mar 13 '22 at 11:50

I was standing in front of the same problem. So I made a small module which enables pythons mp to eat lambdas.

In case you have a lot different unpickleable things I would also recommend to use dill or cloudpickle.

https://github.com/cloasdata/lambdser

pip install lambdser

score 0 · Answer 4 · answered Mar 13 '23 at 09:29

I had a similar problem of having to send data to the workers that can be cloudpickled but not normal-pickled. But I wanted the multiprocessing to work with the normal pickle module for various reasons. I used this pattern:

class FunctionWrapper:

    def __init__(self, fn):
        self.fn_ser = cloudpickle.dumps(fn)

    def __call__(self):
        fn = cloudpickle.loads(self.fn_ser)
        return fn()

then you can call your lambda or whatever is causing the problem like this:

p = multiprocessing.Process(target=forever, args=FunctionWrapper(lambda: print("do  something"),))

The point is that the 'meaningful' serialization is happening outside the multiprocessing module with whatever library you want. The pickle in multiprocessing only sees a plain object with some string attributes.

Replace pickle in Python multiprocessing lib

4 Answers4

Linked