1

I have several multiprocessing.Processes and would like them to consume (queue get()) callable non-picklable objects and call them. These were created before the fork(), so they shouldn't need pickling.

Using multiprocessing.Queue doesn't work as it tries to pickle everything:

import multiprocessing as mp

# create non-global callable to make it unpicklable
def make_callable():
    def foo():
        print("running foo")
    return foo

def bar():
    print("running bar")

def runall(q):
    while True:
        c = q.get()
        if c is None:
            break
        c()

if __name__ == '__main__':
    q = mp.Queue()
    call = make_callable()
    p = mp.Process(target=runall, args=(q,))
    p.start()
    q.put(bar)
    q.put(call)
    q.put(None)
    p.join()
running bar
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib64/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'make_callable.<locals>.foo'

An implementation equivalent would be putting all objects into a global (or passed) list and passing just indexes, which works:

import multiprocessing as mp

# create non-global callable to make it unpicklable
def make_callable():
    def foo():
        print("running foo")
    return foo

def bar():
    print("running bar")

def runall(q, everything):
    while True:
        c = q.get()
        if c is None:
            break
        everything[c]()

if __name__ == '__main__':
    q = mp.Queue()
    call = make_callable()
    everything = [bar, call]
    p = mp.Process(target=runall, args=(q,everything))
    p.start()
    q.put(0)
    q.put(1)
    q.put(None)
    p.join()
running bar
running foo

The problem is that while I know that none of the callables passed will be garbage collected (and thus their addresses will stay valid), I do not have the full list beforehand.

I also know I could probably use multiprocessing.Manager and its Queue implementation using a Proxy object, but this seems like a lot of overhead, especially as in the real implementation I would be passing other picklable data as well.

Is there a way to pickle and pass only the address reference to an object, shared across multiple processes?

Thanks!

Jiří J
  • 29
  • 4

2 Answers2

1

True that Process' target objects must be pickable.

Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value.This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.

Picklable functions and classes must be defined in the top level of a module.

So in your case you need to proceed with passing top-level callables but applying additional checks/workarounds in the crucial runall function:

import multiprocessing as mp

# create non-global callable to make it unpicklable
def make_callable():
    def foo():
        print("running foo")
    return foo

def bar():
    print("running bar")

def runall(q):
    while True:
        c = q.get()
        if c is None:
            break

        res = c()
        if callable(res): res()


if __name__ == '__main__':
    q = mp.Queue()
    p = mp.Process(target=runall, args=(q,))
    p.start()

    q.put(bar)
    q.put(make_callable)
    q.put(None)

    p.join()
    q.close() 

The output:

running bar
running foo
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Yes, that's the obvious solution to the obviously trivial example. However, in my case, I don't know and cannot guarantee that the callables my library gets are top-level and there's not a good reason to mandate picklability and limit the user. – Jiří J Jun 19 '19 at 20:14
  • @JiříJ, The rule is clear - the passed should be pickable. And it's not a good reason if a user can pass "anything" to "anywhere". – RomanPerekhrest Jun 19 '19 at 20:22
0

After a bit of thinking and searching, I believe I have the answer I was looking for, mostly from: Get object by id()?.

I could pass an id() of the callable and then translate it back in the spawned process:

import ctypes
a = "hello world"
print ctypes.cast(id(a), ctypes.py_object).value

Or use the gc module and, as long as I keep a reference to the object alive, that should work too:

import gc

def objects_by_id(id_):
    for obj in gc.get_objects():
        if id(obj) == id_:
            return obj
    raise Exception("No found")

However neither of these are very clean and, in the end, it may be worth imposing a limitation of having all the callables first and just passing indexes.

Jiří J
  • 29
  • 4