Parallel processing loop using multiprocessing Pool

Question

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.

I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.

In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.

# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)

And then I have a class which I want to process each object passed by the pool.map function:

class Parallel(object):
    def __init__(self, args):
        self.some_variable          = args[0]
        self.some_other_variable    = args[1]
        self.yet_another_variable   = args[2]
        self.result                 = None

    def __call__(self):
        self.result                 = self.calculate(self.some_variable)

The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.

Any suggestions? Thanks!

You're making your life needlessly difficult, but if you're determined to do it this way, move the body of `__call__()` into your `__init__()` method. Just like without `multiprocessing`, `Parallel()` only constructs an object - it never goes on to invoke `__call__()` too. You'd need to do `Parallel()()` to get `__call__()` invoked. — Tim Peters, Dec 27 '13 at 03:10
You're right. I wasn't too sure how to approach `multiprocessing`, but I've just rewritten my code to have no classes at all and it works fine! Thanks! — Michael Gradek, Dec 27 '13 at 09:17

unutbu · Accepted Answer · 2013-12-26T19:24:27.113

3

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.

If you really need to use a class, then given your setup, pass an instance of Parallel:

results = pool.map(Parallel(args), self.list_objects)

Since the instance has a __call__ method, the instance itself is callable, like a function.

By the way, the __call__ needs to accept an additional argument:

def __call__(self, val):

since pool.map is essentially going to call in parallel

p = Parallel(args)
result = []
for val in self.list_objects:
    result.append(p(val))

edited Dec 26 '13 at 19:24

answered Dec 26 '13 at 19:17

unutbu

842,883
184
1,785
1,677

Won't this call `__call__` `len(self.list_objects)` times on the same instance, as opposed to calling it on each one? – loopbackbee Dec 27 '13 at 10:55
Yes. Since `args` is not specified as changing, I think this is what the OP intends. – unutbu Dec 27 '13 at 11:00
Oh -- now I see -- maybe the `args` in `Parallel(args)` was supposed to be the items in `self.list_objects`. In that case Tim Peters gave the class-based solution in the comments above. But really, the best solution is to just use a plain function. – unutbu Dec 27 '13 at 11:10
Thanks everyone! You were right, I rewrote it to be plan functions, no classes and works perfectly. It's actually really simple without classes. – Michael Gradek Dec 27 '13 at 14:29

score 2 · Answer 2 · answered Dec 26 '13 at 19:11

Pool.map simply applies a function (actually, a callable) in parallel. It has no notion of objects or classes. Since you pass it a class, it simply calls __init__ - __call__ is never executed. You need to either call it explicitly from __init__ or use pool.map(Parallel.__call__, preinitialized_objects)

Parallel processing loop using multiprocessing Pool

2 Answers2

Linked