Ok, let's do it. First the code(multiprocessing docs):
In [1]: from multiprocessing import Process
In [2]: def f():
...: print(1)
...: for i in range(100):
...: # do something
...: pass
...:
In [3]: p1 = Process(target=f)
In [4]: p1.start()
1
In [5]: p2 = Process(target=f)
In [6]: p2.start()
1
In [7]: import time
In [8]: def f():
...: for i in range(100):
...: print(i)
...: # do something
...: time.sleep(1)
...: pass
...:
In [9]: p1 = Process(target=f)
In [9]: p1 = Process(target=f)
In [10]: p1.start()
0
In [11]: p2 1
= Process(target=f)2
3
4
5
In [11]: p2 = Process(target=f)
In [12]: 6
p2.7
start8
In [12]: p2.start()
0
In [13]: 9
This is an example of how a function can be called in parallel. From In [10]: p1.start()
you can see the output gets jumbled because program p1 is running in parallel while we run program p2.
When running the program in a Python script you want to make sure script only ends when all the programs have executed successfully. You can do this by
def multi_process(instance_params, *funcs):
process = []
for f in funcs:
prog = Process(target=f, args=instance_params)
prog.start()
process.append(prog)
for p in process:
p.join()
multi_process(params, f, f)
Python doesn't have C++ or Java like multithreading support because of GIL. Read about it here. Though if your program is such that it does more I/O operations then CPU intensive tasks then you can use multithreading. For performing CPU intensive tasks multiprocessing is recommended.
In comment @ytutow asked what is difference between pool of workers and process. From Pymotw:
The Pool class can be used to manage a fixed number of workers for
simple cases where the work to be done can be broken up and
distributed between workers independently.
The return values from the jobs are collected and returned as a list.
The pool arguments include the number of processes and a function to
run when starting the task process (invoked once per child).
You can use Pool as:
def your_instance_method(instance):
instances.do_some_computation_over_a_dataset()
with Pool(3) as p:
instances = [insatnce_1, instance_2, instance_3]
print(p.map(your_instance_method, instances))
About the correct number of workers, it's gereral recommendation to have 2*cpu_cores number of workers.