1

I've got a function foo which takes a small object and a large one big_object. The large one is a constant. I'm using multiprocessing to process a list of the small objects. I want to avoid having to pickle/unpickle big_object each time foo is called.

It seems like the initialiser argument of multiprocessing.Pool would be useful for me. But I can't figure it out (memory explodes). My approach at the moment looks like:

big_object = None

def foo(small_object):
   global big_object
   # ... do stuff with big_object
   return result

def init(big_object_arg):
   global big_object
   big_object = big_object_arg

def main():
   [...]
   with mp.Pool(4, initializer=init, initargs=(big_object,)) as pool:
       lst_results = pool.map(foo, lst_small_objects)

This runs, but memory usage explodes for some reason. Why could this be happening?

big_object is a custom C++ object defined via pybind11 for which I have defined pickling functions. These are very slow though.

Nimitz14
  • 2,138
  • 5
  • 23
  • 39
  • Have you looked at _what_ causes the memory to explode? See https://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended for ways to do that – MatsLindh Jul 29 '20 at 09:26

1 Answers1

0

So, the memory blowing up was fine and actually expected. At the same time it was not actually running because I was using a custom C++ object and there was a bug in the pickling code.

In other words, the code I posted is fine. That works!

Nimitz14
  • 2,138
  • 5
  • 23
  • 39