0

In the following snippet, why is py_sqrt2 almost twice as fast as np_sqrt2?

from time import time
from numpy import sqrt as npsqrt
from math import sqrt as pysqrt

NP_SQRT2 = npsqrt(2.0)
PY_SQRT2 = pysqrt(2.0)

def np_sqrt2():
    return NP_SQRT2

def py_sqrt2():
    return PY_SQRT2

def main():
    samples = 10000000

    it = time()
    E = sum(np_sqrt2() for _ in range(samples)) / samples
    print("executed {} np_sqrt2's in {:.6f} seconds E={}".format(samples, time() - it, E))

    it = time()
    E = sum(py_sqrt2() for _ in range(samples)) / samples
    print("executed {} py_sqrt2's in {:.6f} seconds E={}".format(samples, time() - it, E))


if __name__ == "__main__":
    main()

$ python2.7 snippet.py 
executed 10000000 np_sqrt2's in 1.380090 seconds E=1.41421356238
executed 10000000 py_sqrt2's in 0.855742 seconds E=1.41421356238
$ python3.6 snippet.py 
executed 10000000 np_sqrt2's in 1.628093 seconds E=1.4142135623841212
executed 10000000 py_sqrt2's in 0.932918 seconds E=1.4142135623841212

Notice that they are constant functions that just load from precomputed globals with the same value, and that the constants only differ in how they were computed at program start.

Moreover the disassembly of these functions show that they do as expected and only access the global constants.

In [73]: dis(py_sqrt2)                                                                                                                                                                            
  2           0 LOAD_GLOBAL              0 (PY_SQRT2)
              2 RETURN_VALUE

In [74]: dis(np_sqrt2)                                                                                                                                                                            
  2           0 LOAD_GLOBAL              0 (NP_SQRT2)
              2 RETURN_VALUE
Mateo de Mayo
  • 819
  • 9
  • 10

2 Answers2

2

because you are sending it to c every time for just one value

try the following instead

t0=time.time()
numpy.sqrt([2]*10000)
t1 = time.time()
print("Took %0.3fs to do 10k sqrt(2)"%(t1-t0))

t0 = time.time()
for i in range(10000):
    numpy.sqrt(2)
t1 = time.time()
print("Took %0.3fs to do 10k math.sqrt(2)"%(t1-t0))
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Thanks for the answer but I'm not sure this is answering the original question. I do understand that it is good to reduce function calls, but in the question's example, both loops have the same amount of function calls, to functions that are apparently identical. My question is more related to what is python doing behind the scenes that makes the access to one of the globals (`NP_SQRT2`) a lot slower than to the other (`PY_SQRT2`). – Mateo de Mayo May 31 '20 at 12:23
  • numpy needs to send it to c/c++ every time .... so theres a teeny tiny bit of extra overhead ... vs keeping it in python(which the normal sqrt function does). ... numpy is designed to work over a large set of data, not a single point over and over – Joran Beasley May 31 '20 at 15:39
  • Now that I noticed that the constant was a numpy type your answer makes more sense. Thank you. – Mateo de Mayo May 31 '20 at 22:28
0

After running perf record on two versions of the script one using only PY_SQRT2 and the other only NP_SQRT2 it seems that the one using the numpy constant is making extra calls.

This made me realize the two constants have different types:

In [4]: type(PY_SQRT2)                                                          
Out[4]: float

In [5]: type(NP_SQRT2)                                                          
Out[5]: numpy.float64

And so operating with sum on (and maybe loading?) numpy.float64s is slower than native floats.

This answer helped as well.

Mateo de Mayo
  • 819
  • 9
  • 10