python massive performance difference array iteration vs "if in"

Question

Both the code snippets below check if an element exists in the array but first approach takes < 100ms while the second approach takes ~6 seconds .

Does anyone know why ?

import numpy as np
import time

xs = np.random.randint(90000000, size=8000000)

start = time.monotonic()
is_present = -4 in xs

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 100 milliseconds

start = time.monotonic()
for x in xs:
  if (x == -4):
    break

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 6000 milliseconds ```

repl link

Related: https://stackoverflow.com/questions/8385602/why-are-numpy-arrays-so-fast and https://medium.com/@gough.cory/performance-of-numpy-array-vs-python-list-194c8e283b65 — Pranav Hosangadi, May 02 '21 at 09:28
Try this with PyPy rather than CPython and it is magically much faster and the gap is getting closer. The reason is that CPython is a (slow) *interpreter*. The first line execute a optimized native C call while the second use the interpreter to iterate over the list (which is insanely slow compared to doing that using a native compiled code). — Jérôme Richard, May 02 '21 at 11:57

score 3 · Accepted Answer · answered May 02 '21 at 09:16

3

numpy is specifically built to accelerate this kind of code, it is written in c with almost all of the python overhead removed, comparatively your second attempt is pure python so it takes much longer to loop through all the elements

answered May 02 '21 at 09:16

AntiMatterDynamite

1,495
7
17

python massive performance difference array iteration vs "if in"

1 Answers1