0

I have a situation where I encounter problems using any efficent multiprocessing framework in python (no matter whether scoop or multiprocessing).

I have the following situation:

  1. One class 'Foo' which holds a function 'f'
  2. A second class 'Bar' which gets the arguments (kwargs) and holds an instance of class 'Foo' (containing the function)
  3. In 'Bar', the function of class Foo is executed using the arguments given.
  4. The results are, for reasons of statistical significance, averaged over multiple runs, in this case 10 times (not shown in given example).

Here is the example:

import multiprocessing as mp

class Foo:
    def __init__(self, f):
        self.f = f

class Bar:
    def __init__(self, foo, **kwargs):
        self.args = kwargs
        self.foo = foo

    def execute(self):
        pool = mp.Pool(5)
        f = lambda x : self.foo.f(**x)
        args = [self.args] * 10
        results = pool.map(f, args)

if __name__ == '__main__':
    def anything(**kwargs):
        print(kwargs['z'])
        return kwargs['x'] * kwargs['y']
    foo = Foo(anything)
    args = {'x':10, 'y':27, 'z':'Hello'}
    bar = Bar(**args)

I know that functions must be on module level in order to be pickable. Is there any way to be able to get the function pickable? Unfortunately, I am not very experiences in Python OOP, so probably I am missing an important point! Thank you!

EDIT: Unfortunately, even with using the module "multiprocess" which uses dill instead of pickle (thanks to Mike McKerns) it is not guaranteed that my problem is solved. For some short runs of my program, things are fine. For some reasons, multiprocess seems to generate race conditions as I get following error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/multiprocess/pool.py", line 389, in _handle_results
    task = get()
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 209, in loads
    return load(file)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 199, in load
    obj = pik.load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 353, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/usr/lib/python2.7/pickle.py", line 1132, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Individual'

(Individual is a class which is used by my program [genetic algorithm using deap]) Any idea?

Robin
  • 424
  • 4
  • 17
  • 1
    One error (not really related to your question) is that your `__init__` on `Bar` needs to have the `**kwargs` come last in the method – sedavidw Jan 05 '16 at 18:05
  • 1
    substitute the `multiprocess` package for `multiprocessing` and the code should work with no other changes. And if you post code that demonstrates how you intend to use `Foo` and `Bar`, I could clarify further. – Mike McKerns Jan 05 '16 at 18:41
  • Great this works! Do you have any further information on that? Why is this one working? Any drawbacks? Why should I ever struggle with multiprocessing? – Robin Jan 05 '16 at 18:47
  • Found the code on pip, sorry – Robin Jan 05 '16 at 18:48
  • 1
    `multiprocess` is a fork of `multiprocessing` that uses `dill` instead of `pickle`… so you are able to serialize almost anything in python, including stuff you write in the interpreter session. That's the only change. See http://stackoverflow.com/a/21345273/2379433 and http://stackoverflow.com/a/21345308/2379433 and http://stackoverflow.com/a/21345423/2379433 and etc. – Mike McKerns Jan 05 '16 at 18:49
  • Thank you very much! That was the missing pointer! – Robin Jan 05 '16 at 19:03
  • Oh, you are using `deap`… why? for the GPU parallel computing, or otherwise? Because if you are looking for a better parallel optimizer, then you could try `mystic` -- which has been tested extensively with `dill` and `multiprocess` (as they are all my packages). You'd have to get `mystic` from github, however, as the released version is stale. – Mike McKerns Jan 06 '16 at 21:16
  • With regard to your edit: it looks like you are dynamically adding/deleting an attribute. Are you creating `Individual` in `__main__`, or elsewhere? Is the module a globally installed module, or is it just referenced via local import? These variants may affect the pickling/unpickling for modules. No Idea why it would fail intermittently, as I don't know exactly how you are using `multiproess`… so I can't test it. I don't know if `deap` does something funny w8th the parallel map also… that's possible from what I know of it (again, I use `mystic` instead). – Mike McKerns Jan 06 '16 at 21:22

1 Answers1

2

(repeating the comments above)

Substitute the multiprocess package for multiprocessing and the code should work with no other changes. This is because multiprocess is a fork of multiprocessing that uses dill instead of pickle… so you are able to serialize almost anything in python, including stuff you write in the interpreter session. That's the only change made for the fork of multiprocessing.

See https://stackoverflow.com/a/21345273/2379433 and https://stackoverflow.com/a/21345308/2379433 and https://stackoverflow.com/a/21345423/2379433 and etc.

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Unfortunately, the problem occurs again, this time it seems to me like race conditions (as it occurs only occasionally). I edited my post, maybe you already had the same problem? – Robin Jan 06 '16 at 19:35
  • see my reply in the main comments… basically, if you are asking a followup question, or not the same question, if you are asking about parallel with `deap` now. That's a bit more complicated, and you should provide code to demonstrate you issue. I have similar code to yours above that runs in the GA in `mystic`… so post some more info/code. – Mike McKerns Jan 07 '16 at 00:10