Is == comparison between a large np.array with a single number very low in python? I used line_profiler to locate the bottleneck in my code. The bottleneck is just a simple comparison between a 1d np.array with a constant number. It accounts for 80% of the total runtime. Did I do anything wrong causing it to be so slow? Is there any way to accelerate it?
I tried to use multiprocessing, however, in the test code (snippet 2), using multiprocessing is slower than running in sequence and using map directly. Could anyone explain this phenomenon?
Any comments or suggestions are sincerely appreciated.
Snippet 1:
Line # Hits Time Per Hit %Time Line Contents
38 12635 305767927.0 24200.1 80.0 res = map(logicalEqual,assembly)
def logicalEqual(x):
return F[:,-1] == x
assembly = [1,2,3,4,5,7,8,9,...,25]
F is an int typed (281900, 6) np.array
Snippet 2:
import numpy as np
from multiprocessing import Pool
import time
y=np.random.randint(2, 20, size=10000000)
def logicalEqual(x):
return y == x
p=Pool()
start = time.time()
res0=p.map(logicalEqual, [1,2,3,4,5,7,8,9,10,11,12,13,14,15])
# p.close()
# p.join()
runtime = time.time()-start
print(f'runtime using multiprocessing.Pool is {runtime}')
res1 = []
start = time.time()
for x in [1,2,3,4,5,7,8,9,10,11,12,13,14,15]:
res1.append(logicalEqual(x))
runtime = time.time()-start
print(f'sequential runtime is {runtime}')
start = time.time()
res2=list(map(logicalEqual,[1,2,3,4,5,7,8,9,10,11,12,13,14,15]))
runtime = time.time()-start
print(f'runtime is {runtime}')
runtime using multiprocessing.Pool is 0.3612203598022461
sequential runtime is 0.17401981353759766
runtime is 0.19697237014770508