When iterating over NumPy arrays, Numba seems dramatically faster than Cython.
What Cython optimizations am I possibly missing?
Here is a simple example:
Pure Python code:
import numpy as np
def f(arr):
res=np.zeros(len(arr))
for i in range(len(arr)):
res[i]=(arr[i])**2
return res
arr=np.random.rand(10000)
%timeit f(arr)
out: 4.81 ms ± 72.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Cython code (within Jupyter):
%load_ext cython
%%cython
import numpy as np
cimport numpy as np
cimport cython
from libc.math cimport pow
#@cython.boundscheck(False)
#@cython.wraparound(False)
cpdef f(double[:] arr):
cdef np.ndarray[dtype=np.double_t, ndim=1] res
res=np.zeros(len(arr),dtype=np.double)
cdef double[:] res_view=res
cdef int i
for i in range(len(arr)):
res_view[i]=pow(arr[i],2)
return res
arr=np.random.rand(10000)
%timeit f(arr)
Out:445 µs ± 5.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numba code:
import numpy as np
import numba as nb
@nb.jit(nb.float64[:](nb.float64[:]))
def f(arr):
res=np.zeros(len(arr))
for i in range(len(arr)):
res[i]=(arr[i])**2
return res
arr=np.random.rand(10000)
%timeit f(arr)
Out:9.59 µs ± 98.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In this example, Numba is almost 50 times faster than Cython.
Being a Cython beginner, I guess I am missing something.
Of course in this simple case using the NumPy square
vectorized function would have been far more suitable:
%timeit np.square(arr)
Out:5.75 µs ± 78.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)