3

I am trying to parallel calling the same method over multiple instances, where instances are referring to the same object.

Sorry for this confusion statements.

Specifically, I want to change the following for-loop to parallel execution:

for i in range(len(instances)):#instances is a list of instances
   instances[i].do_some_computation_over_a_dataset()

Is it possible?

Note for future readers:

The above code is not the way to iterate over a collection of instances in Python. This is how to iterate in a sequential (ie non-parallel) way:

for i in instances:
    i.do_some_computation_over_a_dataset()
quamrana
  • 37,849
  • 12
  • 53
  • 71
ytutow
  • 285
  • 1
  • 4
  • 13
  • @quamrana , I want to ensure all instances have finished the method. – ytutow Nov 30 '17 at 16:09
  • What makes you think that `Pool` doesn't wait? – quamrana Nov 30 '17 at 16:11
  • @quamrana , I do not know `Pool` very much, just guess. – ytutow Nov 30 '17 at 16:15
  • The first code example in here: https://docs.python.org/3/library/multiprocessing.html obviously waits for all processes to finish so that it can print all the results. – quamrana Nov 30 '17 at 16:17
  • Ok, @quamrana thanks. Possible there is a difference between this question and you linked question? Here we want to call the same method in multiple instances while there they call the same method over different parameters. – ytutow Nov 30 '17 at 16:22
  • Just don't add any parameters. (or you may have to add an empty tuple: (,) ? ) – quamrana Nov 30 '17 at 16:24
  • You mean `zip(*pool.map(calc_stuff, (,)))` ? But how can `calc_stuff ` be a method over multiple instances. – ytutow Nov 30 '17 at 16:26

2 Answers2

3

Ok, let's do it. First the code(multiprocessing docs):

In [1]: from multiprocessing import Process

In [2]: def f():
   ...:     print(1)
   ...:     for i in range(100):
   ...:         # do something
   ...:         pass
   ...:

In [3]: p1 = Process(target=f)

In [4]: p1.start()

1
In [5]: p2 = Process(target=f)

In [6]: p2.start()

1
In [7]: import time

In [8]: def f():
   ...:     for i in range(100):
   ...:         print(i)
   ...:         # do something
   ...:         time.sleep(1)
   ...:         pass
   ...:
In [9]: p1 = Process(target=f)
In [9]: p1 = Process(target=f)

In [10]: p1.start()

0
In [11]: p2 1
= Process(target=f)2
3
4
5
In [11]: p2 = Process(target=f)

In [12]: 6
p2.7
start8
In [12]: p2.start()

0
In [13]: 9

This is an example of how a function can be called in parallel. From In [10]: p1.start() you can see the output gets jumbled because program p1 is running in parallel while we run program p2.

When running the program in a Python script you want to make sure script only ends when all the programs have executed successfully. You can do this by

def multi_process(instance_params, *funcs):
   process = []
   for f in funcs:
       prog = Process(target=f, args=instance_params)
       prog.start()
       process.append(prog)
   for p in process:
       p.join()

multi_process(params, f, f)

Python doesn't have C++ or Java like multithreading support because of GIL. Read about it here. Though if your program is such that it does more I/O operations then CPU intensive tasks then you can use multithreading. For performing CPU intensive tasks multiprocessing is recommended.

In comment @ytutow asked what is difference between pool of workers and process. From Pymotw:

The Pool class can be used to manage a fixed number of workers for simple cases where the work to be done can be broken up and distributed between workers independently.

The return values from the jobs are collected and returned as a list.

The pool arguments include the number of processes and a function to run when starting the task process (invoked once per child).

You can use Pool as:

def your_instance_method(instance):
   instances.do_some_computation_over_a_dataset()

with Pool(3) as p:
    instances = [insatnce_1, instance_2, instance_3]
    print(p.map(your_instance_method, instances))

About the correct number of workers, it's gereral recommendation to have 2*cpu_cores number of workers.

Amit Tripathi
  • 7,003
  • 6
  • 32
  • 58
  • Thanks! What the difference between `multi_process` and `multiprocessing ` library in python? – ytutow Nov 30 '17 at 16:10
  • The multiple instances will run the same method over some dataset, for example, compute the mean. I believe it is CPU intensive task. – ytutow Nov 30 '17 at 16:11
  • @ytutow thats its CPU intensive task. Use `multiprocessing` its Python's standard library. I am not sure if there is any mulit_process module in Python. Is it a 3rd party module? – Amit Tripathi Nov 30 '17 at 16:13
  • Sorry I am not very familiar with `multiprocessing ` – ytutow Nov 30 '17 at 16:14
  • @ytutow The answer gives an example of how you can use it. Read the docs I have attached for more info. Does this answer your question? If yes, you can mark this as accepted? – Amit Tripathi Nov 30 '17 at 16:16
  • Thanks, give me some time for me read and understand this. – ytutow Nov 30 '17 at 16:17
  • Hi @Amit Tripathi What is the difference between `from multiprocessing import Pool` and `from multiprocessing import Process`, or `Pool` and `Process`, does prior one not suitable for this task? – ytutow Nov 30 '17 at 16:20
  • @ytutow see the edit to the answer for difference between the two. – Amit Tripathi Nov 30 '17 at 16:28
  • Hi @Amit Tripathi , You mean `Pool` may break some tasks? And is better for simple tasks? – ytutow Nov 30 '17 at 16:33
  • Hi @Amit Tripathi, sorry I missed a problem, how can I pass a list of `instance[i].call_method` to `multi_process `? – ytutow Nov 30 '17 at 16:37
  • I have edited the answer for passing the arguments see this SO question also https://stackoverflow.com/questions/1559125/string-arguments-in-python-multiprocessing. I also suggest that you read the docs once. It will clear your doubt. – Amit Tripathi Nov 30 '17 at 16:43
2

This code seems to show the difference between a for loop and Pool, calling a method on different instances:

from multiprocessing import Pool

instances = ['a','ab','abc','abcd']


def calc_stuff(i):
    return len(i)


if __name__ == '__main__':

    print('One at a time')
    for i in instances:
        print(len(i))

    print('Use Pool')
    with Pool(4) as pool:
        print(pool.map(calc_stuff, instances))

Note the use of if __name__ == '__main':

This separates each process out.

Output:

One at a time
1
2
3
4
Use Pool
[1, 2, 3, 4]
quamrana
  • 37,849
  • 12
  • 53
  • 71