0

Both the code snippets below check if an element exists in the array but first approach takes < 100ms while the second approach takes ~6 seconds .

Does anyone know why ?

import numpy as np
import time

xs = np.random.randint(90000000, size=8000000)

start = time.monotonic()
is_present = -4 in xs

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 100 milliseconds

start = time.monotonic()
for x in xs:
  if (x == -4):
    break

end = time.monotonic()

print( 'exec time:', round(end-start, 3) , 'sec ') // 6000 milliseconds ```

repl link

sbr
  • 4,735
  • 5
  • 43
  • 49
  • 1
    Related: https://stackoverflow.com/questions/8385602/why-are-numpy-arrays-so-fast and https://medium.com/@gough.cory/performance-of-numpy-array-vs-python-list-194c8e283b65 – Pranav Hosangadi May 02 '21 at 09:28
  • Try this with PyPy rather than CPython and it is magically much faster and the gap is getting closer. The reason is that CPython is a (slow) *interpreter*. The first line execute a optimized native C call while the second use the interpreter to iterate over the list (which is insanely slow compared to doing that using a native compiled code). – Jérôme Richard May 02 '21 at 11:57

1 Answers1

3

numpy is specifically built to accelerate this kind of code, it is written in c with almost all of the python overhead removed, comparatively your second attempt is pure python so it takes much longer to loop through all the elements

AntiMatterDynamite
  • 1,495
  • 7
  • 17