Parallelizing a python code for different instances of a class

Question

My question is related to parallelizing a python code and I want to know how we can run a function for different instances of a class to decrease the runtime.

What I have: I have multiple instances of a class A (stored in a list called instances). This class has a function add. Now, we have multiple independent tasks, one for each instance of class A where the input to all these tasks is one thing (number n in my example). Each instance needs to apply function add to n and return a number. We want to store the returned numbers of all instances in a list (list results in my example).

What I want: As you can see, in this example, the tasks can be parallelized as there is no need for one to wait for the other one to gets done. How can we parallelize the simple code below? Since nothing is shared between the different instances, I guess we can even use multithreading, right? Or the only way is to use multiprocessing?

class A(object):
    def __init__(self, q):
        self.p = q

    def add(self, num):
        return self.p + num


instances = []
for i in xrange(5):
    instances.append(A(i))
n = 20
results = []
for inst in instances:
    results.append(inst.add(n))

print(results)

Output: [20, 21, 22, 23, 24]

For problems this simple it doesn't make any sense at all to use multiple threads. If you're having a specific problem, please describe it. This rather broad question with toy code is not really useful. — moooeeeep, Jul 17 '18 at 20:34
@moooeeeep I have a very complicated task (for each instance of the class to accomplish) but if I know how to use multithreading for this simple example, I should be able to use it for my task. — user491626, Jul 17 '18 at 20:37
Related: https://stackoverflow.com/q/2846653/1025391 and https://stackoverflow.com/q/1294382/1025391 and https://stackoverflow.com/q/9038711/1025391 — moooeeeep, Jul 17 '18 at 20:37

moooeeeep · Accepted Answer · 2018-07-18T11:54:20.963

2

The pattern that your toy code seems to follow would suggest to map a wrapper function to the list using a thread pool / process pool. The number of instances and the basic arithmetic operation that you want to apply for each instance however suggests that the overhead for parallelizing this would outweigh any potential benefit.

Whether it makes sense to do this, depends on the number of instances and the time it takes to run each of those member functions. So make sure to do at least some basic profiling of your code before you try to parallelize this. Find out whether the tasks you attempt to parallelize is CPU-bound or IO-bound.

Here's an example that should demonstrate the basic pattern:

# use multiprocessing.Pool for a processes-based worker pool
# use multiprocessing.dummy.Pool for a thread-based worker pool
from multiprocessing.dummy import Pool
# make up list of instances
l = [list() for i in range(5)]
# function that calls the method on each instance
def foo(x):
    x.append(20)
    return x
# actually call functions and retrieve list of results
p = Pool(3)
results = p.map(foo, l)
print(results)

Obviously you need to fill the blanks to adapt this to your real code.

For further reading:

Also maybe have a look at futures:

If you really want to have this parallel, also consider to port your calculations to a GPU (you might need to move away from Python then).

edited Jul 18 '18 at 11:54

answered Jul 17 '18 at 20:53

moooeeeep

31,622
22
98
187

Can you write your answer based on my example? This example is not helping me as I don't have one function that I can apply to different tasks, instead I have one similar function inside different instances of a class. – user491626 Jul 17 '18 at 20:56
@user491626 Of course, you need to come up with a function that does what you want specifically. Just replace my `.append(20)` with `.add(20)` to fit your example. This is the pattern, which I think you need. You just need to fill in the blanks. – moooeeeep Jul 17 '18 at 21:02
It is not just that. You function takes l as input. I mean you have modified the function. I don't wan to change my class (if it is possible) – user491626 Jul 17 '18 at 21:07
@user491626 Then it seems I don't know what you are actually asking about. – moooeeeep Jul 17 '18 at 21:16
Can you take a look the updated question? I guess I can use multithreading. Can you also add how we can use multithreading in your answer? – user491626 Jul 18 '18 at 02:17
@user491626 Please have a look at my update. You can use threads or processes for parallelization, but you should make sure to understand the implications of the _global interpreter lock_. Also measure performance to evaluate if parallel execution actually is faster in the first place. – moooeeeep Jul 18 '18 at 11:51
Thanks for the update. I could use your answer to write my code in multiprocessing/multithreading format, but I don't know why doing that increase the runtime. But, that's a different question. I asked it as a separate question here: https://stackoverflow.com/questions/51407464/why-parallelizing-doesnt-decrease-the-runtime – user491626 Jul 18 '18 at 17:06
If I want to pass some other arguments (other than l) to the map function, how can I do that? – user491626 Jul 18 '18 at 18:04
@user491626 You mean something like this: https://stackoverflow.com/q/5442910/1025391 ? – moooeeeep Jul 18 '18 at 18:40
yes, thanks. When I apply multithreading to my code, it works fine but it is slow. But, when I remove .dummy to use multiprocessing, I get this error: cPickle.PicklingError: Can't pickle : attribute lookup __builtin__.function failed I search this error, but the existing solution didn't work. Do you have any idea why it works with multithreading but not multiprocessing? – user491626 Jul 18 '18 at 20:10
Is this correct? "If I want run functions of different object instances in different processors, when you create those instances, I HAVE TO define them in different processors" Someone said there here in answer to my question: https://stackoverflow.com/questions/51411098/error-in-multiprocessing-using-concurrent-futures?noredirect=1#comment89795637_51411098 – user491626 Jul 19 '18 at 16:48

Parallelizing a python code for different instances of a class

1 Answers1

Linked