I have two codes to calculate a function. One is based on python for loops as below:
@nb.autojit()
def ODEfunction(r):
tic = time.process_time()
NP=10
s=1
l=100
f = np.zeros(len(r))
lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)
f[0:NP]=r[NP:2*NP]
for i in range(NP):
fi = 0.0
for j in range(NP):
if (j!=i):
fij = -4*(12*s**12/(r[j]-r[i])**13-6*s**6/(r[j]-r[i]) ** 7)
fi = fi + fij
f[i+NP]=fi
f[NP]=f[NP]+lbound
f[2*NP-1]=f[2*NP-1]+rbound
toc = time.process_time()
print(toc-tic)
return f
The other is the equivalent vectorized version of this code:
@nb.autojit()
def ODEfunction(r):
tic=time.process_time()
NP=10
s=1
l=100
f = np.zeros(len(r))
lbound=-4* (12*s**12/(-0.5*l-r[0])**13-6*s**6/(-0.5*l-r[0])**7)
rbound=-4* (12*s**12/(0.5*l-r[NP-1])**13-6*s**6/(0.5*l-r[NP-1])**7)
f[0:NP]=r[NP:2*NP]
ri=r[0:NP]
rj = r[0:NP]
rij=np.subtract.outer(rj,ri)
fij = -4 * (12 * s ** 12 / (rij) ** 13 - 6 * s ** 6 / (rij) ** 7)
fij[np.diag_indices(NP)]=0
f[NP:2*NP] = fij.sum(axis=0)
f[NP]=f[NP]+lbound
f[2*NP-1]=f[2*NP-1]+rbound
toc=time.process_time()
print(toc-tic)
return f
In both the input r is a numpy 1 by 20 array and as you see I am using numba to accelerate the codes. It is surprising that in this case, the vectorized code is 5 times slower than the for loops. I have seen a similar problem before in some posts like here: Numpy: Single loop vectorized code slow compared to two loop iteration
However, the problem there is the huge size of arrays. As you see in my case there is not any big array involved. does anybody know the reason and how it might be solved?