1

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.

I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.

In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.

# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)

And then I have a class which I want to process each object passed by the pool.map function:

class Parallel(object):
    def __init__(self, args):
        self.some_variable          = args[0]
        self.some_other_variable    = args[1]
        self.yet_another_variable   = args[2]
        self.result                 = None

    def __call__(self):
        self.result                 = self.calculate(self.some_variable)

The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.

Any suggestions? Thanks!

Community
  • 1
  • 1
Michael Gradek
  • 2,628
  • 3
  • 29
  • 35
  • 1
    You're making your life needlessly difficult, but if you're determined to do it this way, move the body of `__call__()` into your `__init__()` method. Just like without `multiprocessing`, `Parallel()` only constructs an object - it never goes on to invoke `__call__()` too. You'd need to do `Parallel()()` to get `__call__()` invoked. – Tim Peters Dec 27 '13 at 03:10
  • You're right. I wasn't too sure how to approach `multiprocessing`, but I've just rewritten my code to have no classes at all and it works fine! Thanks! – Michael Gradek Dec 27 '13 at 09:17

2 Answers2

3

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.

If you really need to use a class, then given your setup, pass an instance of Parallel:

results = pool.map(Parallel(args), self.list_objects)

Since the instance has a __call__ method, the instance itself is callable, like a function.


By the way, the __call__ needs to accept an additional argument:

def __call__(self, val):

since pool.map is essentially going to call in parallel

p = Parallel(args)
result = []
for val in self.list_objects:
    result.append(p(val))
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Won't this call `__call__` `len(self.list_objects)` times on the same instance, as opposed to calling it on each one? – loopbackbee Dec 27 '13 at 10:55
  • Yes. Since `args` is not specified as changing, I think this is what the OP intends. – unutbu Dec 27 '13 at 11:00
  • Oh -- now I see -- maybe the `args` in `Parallel(args)` was supposed to be the items in `self.list_objects`. In that case Tim Peters gave the class-based solution in the comments above. But really, the best solution is to just use a plain function. – unutbu Dec 27 '13 at 11:10
  • Thanks everyone! You were right, I rewrote it to be plan functions, no classes and works perfectly. It's actually really simple without classes. – Michael Gradek Dec 27 '13 at 14:29
2

Pool.map simply applies a function (actually, a callable) in parallel. It has no notion of objects or classes. Since you pass it a class, it simply calls __init__ - __call__ is never executed. You need to either call it explicitly from __init__ or use pool.map(Parallel.__call__, preinitialized_objects)

loopbackbee
  • 21,962
  • 10
  • 62
  • 97