1

Is == comparison between a large np.array with a single number very low in python? I used line_profiler to locate the bottleneck in my code. The bottleneck is just a simple comparison between a 1d np.array with a constant number. It accounts for 80% of the total runtime. Did I do anything wrong causing it to be so slow? Is there any way to accelerate it?

I tried to use multiprocessing, however, in the test code (snippet 2), using multiprocessing is slower than running in sequence and using map directly. Could anyone explain this phenomenon?

Any comments or suggestions are sincerely appreciated.

Snippet 1:

Line # Hits Time Per Hit %Time Line Contents

38 12635 305767927.0 24200.1 80.0 res = map(logicalEqual,assembly)

def logicalEqual(x):
         return F[:,-1] == x

assembly = [1,2,3,4,5,7,8,9,...,25]

F is an int typed (281900, 6) np.array

Snippet 2:

import numpy as np
from multiprocessing import Pool
import time

y=np.random.randint(2, 20, size=10000000)

def logicalEqual(x):
    return y == x

p=Pool()
start = time.time()
res0=p.map(logicalEqual, [1,2,3,4,5,7,8,9,10,11,12,13,14,15])
# p.close()
# p.join()
runtime = time.time()-start
print(f'runtime using multiprocessing.Pool is {runtime}')

res1 = []
start = time.time()
for x in [1,2,3,4,5,7,8,9,10,11,12,13,14,15]:
    res1.append(logicalEqual(x))
runtime = time.time()-start
print(f'sequential runtime is {runtime}')


start = time.time()
res2=list(map(logicalEqual,[1,2,3,4,5,7,8,9,10,11,12,13,14,15]))
runtime = time.time()-start
print(f'runtime is {runtime}')

runtime using multiprocessing.Pool is 0.3612203598022461
sequential runtime is 0.17401981353759766
runtime is  0.19697237014770508
Albert G Lieu
  • 891
  • 8
  • 16
  • 1
    `map` doesn't actually run the function unless you iterate over the resulting iterator on Python 3. Not running functions is pretty fast, as you might guess. – user2357112 Jun 04 '19 at 20:18
  • @user2357112 Yes, you are right. I overlooked that point. Now I corrected the misleading part in my question. Thank you very much. – Albert G Lieu Jun 04 '19 at 20:49
  • How big is `F`? What is `x`, what is the datatype of `F`? – user2699 Jun 04 '19 at 20:55
  • F is int type np.array sied of (281900, 6) – Albert G Lieu Jun 04 '19 at 20:59
  • I don't think you realise how much work you are asking the program to do. You are essentially asking the program to compare ~10 numbers to ~10 million other numbers. That's roughly 1/10th of a billion. CPUs are typically rated in the other of GHz (or billions of operations per second). Therefore, that your program ran in approximately 1/10th of a second is expected. – Dunes Jun 04 '19 at 21:00

1 Answers1

0

Array comparison is fast, since it is done in C code, not Python.

x = np.random.rand(1000000)
y = 4.5
test = 0.55
%timeit x == test
386 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit y == test
33.2 ns ± 0.121 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So, comparing one Python float to another takes 33*10^-9 s, while comparing 1E6 numpy floats takes only 386 µs / 33 ns ~= 11700 times longer, despite comparing 1000000 more values. The same is true for ints (377 µs vs 34 ns). But as Dunes mentioned in a comment, comparing a lot of values takes a lot of cycles. Nothing you can do about that.

Olaf
  • 586
  • 5
  • 18
  • Thank you very much for the analysis and answering my question. It looks like that it reached the CPU clock rate. But can I use multiprocessing to parallelize the computation? As shown in snippet 2, multiprocessing.Pool() did not accelerate computation. I do not want to spend 80% of the runtime on a simple logical comparison. I did not understand why multiprocessing did not improve the performance in this case, though I had seen performance gain achieved with multiprocessing in python. – Albert G Lieu Jun 04 '19 at 21:19
  • The startup time for a process could hinder the results here. [This answer](https://stackoverflow.com/a/52076791/11500265) lists the costs of processes in Python. This means you would see the speedup only for longer running taks, where startup cost is low in comparison to total runtime. – Olaf Jun 04 '19 at 21:29
  • @AlbertGLieu because the overhead of serializing your array, sending it to the new process, then deserializing it, *then* doing the calculation, then serializing it again and sending it back is not worth any gains of parallelism. Especially for an array of that small size. – juanpa.arrivillaga Jun 04 '19 at 23:18
  • Juanpa, I understand the overhead caused by the setup of multiprocessing, but 80% of the runtime in my code was taken by such a simple comparison. I am aware of how big my np.array is and how many times the comparison has been carried out. Is there any way to accelerate the comparison? Would Pycuda help with that? – Albert G Lieu Jun 04 '19 at 23:36