I am running on a machine with two AMD 7302 16 core processors (a total of 32 core). I'm on a Red Hat 8.4 system and using Python 3.10.6.
I've recently started learning the multiprocessing library. Inspired by first example on the documentation page, I wrote my own little code :
from multiprocessing import Pool
import numpy as np
import sys
import datetime
def f(x):
return x**2
def main(DataType="List", NThr=2, Vectorize=False):
N = 5*10**7 # number of elements
n = NThr # number of threads
y = np.zeros(N)
# Use list
if(DataType == "List"):
x = []
for i in range(N):
x.append(i)
# Use Numpy
elif(DataType=="Numpy"):
x = np.zeros(N)
for i in range(len(x)):
x[i] = i
# Run parallel code
t0 = datetime.datetime.now()
if(n==1):
if(DataType == "Numpy" and Vectorize == True):
y = np.vectorize(f)(x)
else:
for i in range(len(x)):
y[i] = f(x[i])
else:
with Pool(n) as p:
y = p.map(f, x)
t1 = datetime.datetime.now()
dt = (t1 - t0).total_seconds()
print("{} : Vect = {}, n = {}, time : {}s".format(DataType,Vectorize,n,dt))
sys.exit(0)
if __name__ == "__main__":
main()
I noticed that when I try to run p.map()
over a numpy array, it performs substantially worse. Here is the output from several runs (python mycode.py
) after twiddling the args to main
:
Numpy : Vect = True, n = 1, time : 9.566441s
Numpy : Vect = False, n = 1, time : 16.00333s
Numpy : Vect = False, n = 2, time : 143.331352s
List : Vect = False, n = 1, time : 21.11657s
List : Vect = False, n = 2, time : 11.868897s
List : Vect = False, n = 5, time : 6.162561s
Look at the (Numpy, n=2) run at 143s. It's run time is substantially worse than the (List, n=2) run at 11.9s. It is also much worse than either of the (Numpy, n=1) runs.
Question :
What makes numpy arrays take so long to run with the multiprocessing library, specifically when NThr==2
?
EDIT :
Per a comment's suggestion, I ran both versions (Numpy, n=2) and (List, n=2) through the profiler :
>>> import cProfile
>>> from mycode import main
>>> cProfile.run('main()')
and compared them side by side. The most time consuming function calls and the calls with different numbers to them are listed below.
For Numpy version :
ncalls tottime percall cumtime percall filename:lineno(function)
# Time consuming
1 0.000 0.000 138.997 138.997 pool.py:362(map)
1 0.000 0.000 138.956 138.956 pool.py:764(wait)
1 0.000 0.000 138.956 138.956 pool.py:767(get)
4 0.000 0.000 138.957 34.739 threading.py:288(wait)
4 0.000 0.000 138.957 34.739 threading.py:589(wait)
14/1 0.000 0.000 145.150 145.150 {built-in method builtins.exec}
19 138.957 7.314 138.957 7.314 {method 'acquire' of '_thread.lock' objects}
# Different number of calls
6 0.000 0.000 0.088 0.015 popen_fork.py:24(poll)
1 0.000 0.000 0.088 0.088 popen_fork.py:36(wait)
1 0.000 0.000 0.088 0.088 process.py:142(join)
10 0.000 0.000 0.000 0.000 process.py:99(_check_closed)
18 0.000 0.000 0.000 0.000 util.py:48(debug)
76 0.000 0.000 0.000 0.000 {built-in method builtins.len}
2 0.000 0.000 0.000 0.000 {built-in method numpy.zeros}
17 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
6 0.088 0.015 0.088 0.015 {built-in method posix.waitpid}
3 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
For List version :
ncalls tottime percall cumtime percall filename:lineno(function)
# Time consuming
1 0.000 0.000 13.961 13.961 pool.py:362(map)
1 0.000 0.000 13.920 13.920 pool.py:764(wait)
1 0.000 0.000 13.920 13.920 pool.py:767(get)
4 0.000 0.000 13.921 3.480 threading.py:288(wait)
4 0.000 0.000 13.921 3.480 threading.py:589(wait)
14/1 0.000 0.000 24.475 24.475 {built-in method builtins.exec}
19 13.921 0.733 13.921 0.733 {method 'acquire' of '_thread.lock' objects}
# Different number of calls
7 0.000 0.000 0.132 0.019 popen_fork.py:24(poll)
2 0.000 0.000 0.132 0.066 popen_fork.py:36(wait)
2 0.000 0.000 0.132 0.066 process.py:142(join)
12 0.000 0.000 0.000 0.000 process.py:99(_check_closed)
19 0.000 0.000 0.000 0.000 util.py:48(debug)
75 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method numpy.zeros}
18 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
7 0.132 0.019 0.132 0.019 {built-in method posix.waitpid}
50000003 2.780 0.000 2.780 0.000 {method 'append' of 'list' objects}
Note that for the List version, there are 50000003 calls to append()
compared to 3 calls to append()
in the Numpy version. due to the initialization of the x
.