According to the snippet below, performing an in-place addition with a numba jit-compiled function is ~10 times faster than with numpy's ufunc.
This would be understandable with a function performing multiple numpy operations as explained in this question.
But here the improvement concern 1 simple numpy ufunc... So why is numba so much faster ? I'm (naively ?) expecting that the numpy ufunc somehow internally uses some compiled code and that a task as simple as an addition would already be close to optimally optimized ?
More generally : should I expect such dramatic performance differences for other numpy functions ? Is there a way to predict when it's worth to re-write a function and numba-jit it ?
the code :
import numpy as np
import timeit
import numba
N = 200
target1 = np.ones( N )
target2 = np.ones( N )
# we're going to add these values :
addedValues = np.random.uniform( size=1000000 )
# into these positions :
indices = np.random.randint(N,size=1000000)
@numba.njit
def addat(target, index, tobeadded):
for i in range( index.size):
target[index[i]] += tobeadded[i]
# pre-run to jit compile the function
addat( target2, indices, addedValues)
target2 = np.ones( N ) # reset
npaddat = np.add.at
t1 = timeit.timeit( "npaddat( target1, indices, addedValues)", number=3, globals=globals())
t2 = timeit.timeit( "addat( target2, indices, addedValues)", number=3,globals=globals())
assert( (target1==target2).all() )
print("np.add.at time=",t1, )
print("jit-ed addat time =",t2 )
on my computer I get :
np.add.at time= 0.21222890191711485
jit-ed addat time = 0.003389443038031459
so more than a factor 10 improvement...