I've got a function foo
which takes a small object and a large one big_object
. The large one is a constant. I'm using multiprocessing to process a list of the small objects. I want to avoid having to pickle/unpickle big_object
each time foo
is called.
It seems like the initialiser
argument of multiprocessing.Pool
would be useful for me. But I can't figure it out (memory explodes). My approach at the moment looks like:
big_object = None
def foo(small_object):
global big_object
# ... do stuff with big_object
return result
def init(big_object_arg):
global big_object
big_object = big_object_arg
def main():
[...]
with mp.Pool(4, initializer=init, initargs=(big_object,)) as pool:
lst_results = pool.map(foo, lst_small_objects)
This runs, but memory usage explodes for some reason. Why could this be happening?
big_object
is a custom C++ object defined via pybind11 for which I have defined pickling functions. These are very slow though.