Multi-threading in CPython cannot use more than one CPU in parallel because the existence of GIL. To break this limitation, we can use multiprocessing. I'm writing Python code to demonstrate that. Here is my code:
from math import sqrt
from time import time
from threading import Thread
from multiprocessing import Process
def time_recorder(job_name):
"""Record time consumption of running a function"""
def deco(func):
def wrapper(*args, **kwargs):
print(f"Run {job_name}")
start_epoch = time()
func(*args, **kwargs)
end_epoch = time()
time_consume = end_epoch - start_epoch
print(f"Time consumption of {job_name}: {time_consume}")
return wrapper
return deco
def calc_sqrt():
"""Consume the CPU"""
i = 2147483647
for j in range(20 * 1000 * 1000):
i -= 1
sqrt(i)
@time_recorder("one by one")
def one_by_one():
for _ in range(8):
calc_sqrt()
@time_recorder("multi-threading")
def multi_thread():
t_list = list()
for i in range(8):
t = Thread(name=f'worker-{i}', target=calc_sqrt)
t.start()
t_list.append(t)
for t in t_list:
t.join()
@time_recorder("multi-processing")
def multi_process():
p_list = list()
for i in range(8):
p = Process(name=f"worker-{i}", target=calc_sqrt)
p.start()
p_list.append(p)
for p in p_list:
p.join()
def main():
one_by_one()
print('-' * 40)
multi_thread()
print('-' * 40)
multi_process()
if __name__ == '__main__':
main()
Function "calc_sqrt()" is the CPU-consuming job, which calculates square root for 20 million times. Decorator "time_recorder" calculates the running time of the decorated functions. And there are 3 functions which run the CPU-consuming job one by one, in multiple threads and in multiple processes respectively.
By running the above code on my laptop, I got the following output:
Run one by one
Time consumption of one by one: 39.31295585632324
----------------------------------------
Run multi-threading
Time consumption of multi-threading: 39.36112403869629
----------------------------------------
Run multi-processing
Time consumption of multi-processing: 23.380358457565308
Time consumption of "one_by_one()" and "multi_thread()" are almost the same, which are as expected. But time consumption of "multi_process()" is a little bit confusing. My laptop has an Intel Core i5-7300U CPU, which has 2 cores, 4 threads. Task manager simply shows that there are 4 (logic) CPUs in my computer. Task manager also shows that the CPU usage of all the 4 CPUs are 100% during the execution. But the processing time didn't reduce to 1/4 but rather 1/2, why? The operating system of my laptop is Windows 10 64-bit.
Later, I tried this program on a Linux virtual machine, and got the following output, which is more reasonable:
Run one by one
Time consumption of one by one: 33.78603768348694
----------------------------------------
Run multi-threading
Time consumption of multi-threading: 34.396817684173584
----------------------------------------
Run multi-processing
Time consumption of multi-processing: 8.470374584197998
This time, processing time with multi-processing reduced to 1/4 of that with multi-threading. Host of this Linux server equipped with an Intel Xeon E5-2670, which has 8 cores and 16 threads. The host OS is CentOS 7. The VM is assigned with 4 vCPUs and the OS is Debian 10.
The questions are:
- why didn't the processing time of the multi-processing job reduce to 1/4 but rather to just 1/2 on my laptop?
- Is it a CPU issue, which means that the 4 threads of Intel Core i5-7300U are not "real parallel" and may impact each other, and Intel Xeon E5-2670 doesn't have that issue?
- Or is it an OS issue, which means that Windows 10 doesn't support multi-processing well, processes may impact each other when running in parallel?