I have already asked a question here regarding multi-threading inside multi-processing, results of which are hard to understand and generated one more popular question Multi-threading V/s Multi-Processing.
I have already gone through various post regarding this but none of them clearly answered which to select over the other and not even the methods to check which one suits best for the need. From most of the post, I come to know that Multi-threading is I/O and Multi-processing is CPU bound but when I used both in case of CPU bound process the results are not in favour of the hypothesis that one can blindly pick Multi-threading for I/O and Multi-processing for CPU bound.
As in my case, as the process is CPU bound, the results are in favor of Multi-threading. I have observed that sometime even in CPU bound process multi-threading takes the lead in comparison to multi-processing. I am in search of methodology that helps me to pick one of these to use ?
Below is my analysis where I ran multi-process and multi-threaded code on my Intel i7, 8th Gen, 8-core , 16 GB machine using Python 3.7.2 (Also tested it on Python 3.8.2)
Defining required functions and variablesimport numpy as np
import time
import concurrent.futures
a = np.arange(100000000).reshape(100, 1000000)
def list_double_value(x):
y = []
for elem in x:
y.append(2 *elem)
return y
def double_value(x):
return 2* x
Case 1 (Using function that take list a input and multiply every elem of it by 2
Multiprocess using list_double_value function (Took 145 Seconds)t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
my_results = executor.map(list_double_value, a) # takes a list and double its value
print(time.time()-t)
Multi-Threading using list_double_value function (Took 28 seconds)
t = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
my_results = executor.map(list_double_value, a)
print(time.time()-t)
Case 2 (Using function that takes a value and multiple it by 2)
Multi processing using double value (Took 2.73 Seconds)t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
my_results = executor.map(double_value, a)
print(time.time()-t)
Multi Threading using double value (Took 0.2660 Seconds)
t = time.time()
with concurrent.futures.ThreadPoolExecutor() as executor:
my_results = executor.map(double_value, a)
print(time.time()-t)
Going from the above analysis is this the case that every time before writing code for multi-threading or multi-processing we need to check which perform faster and opt-in for that or is there any set of rules that provide concrete rules to select one over the other ?
Also let me know if all these results are due to the lib concurrent.futures which I used. (I am not sure about the lib also)