I add a single integer to an array of integers with 1000 elements. This is faster by 25% when I first cast the single integer from numpy.int64
to the python-native int
.
Why? Should I, as a general rule of thumb convert the single number to native python formats for single-number-to-array operations with arrays of about this size?
Note: may be related to my previous question Conjugating a complex number much faster if number has python-native complex type.
import numpy as np
nnu = 10418
nnu_use = 5210
a = np.random.randint(nnu,size=1000)
b = np.random.randint(nnu_use,size=1)[0]
%timeit a + b # --> 3.9 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a + int(b) # --> 2.87 µs ± 8.07 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that the speed-up can be enormous (factor 50) for scalar-to-scalar-operations as well, as seen below:
np.random.seed(100)
a = (np.random.rand(1))[0]
a_native = float(a)
b = complex(np.random.rand(1)+1j*np.random.rand(1))
c = (np.random.rand(1)+1j*np.random.rand(1))[0]
c_native = complex(c)
%timeit a * (b - b.conjugate() * c) # 6.48 µs ± 49.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a_native * (b - b.conjugate() * c_native) # 283 ns ± 7.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit a * b # 5.07 µs ± 17.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a_native * b # 94.5 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Update: Could it be that the latest numpy release fixes the speed difference? The release notes of numpy 1.23
mention that scalar operations are now much faster, see https://numpy.org/devdocs/release/1.23.0-notes.html#performance-improvements-and-changes and https://github.com/numpy/numpy/pull/21188. I am using python 3.7.6, numpy 1.21.2
.