I have a sum of sums that I want to speed up. In one case it is:
S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
In the other case it is:
S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} )^2
Note: S_{indices}: is the sum over those indices
The first case I have figured out how to do using numpy's einsum
and it results in an amazing speedup ~ x160.
Also, I have thought of trying to expand the square but won't that be a killer as I would need to sum over x,y,k,l,k,l instead of x,y,k,l?
Here is an implementation that demonstrates the difference and the solution I have with einsum
.
Nx = 3
Ny = 4
Nk = 5
Nl = 6
Nu = 7
Nv = 8
Fx = np.random.rand(Nx, Nk)
Fy = np.random.rand(Ny, Nl)
Fu = np.random.rand(Nu, Nk)
Fv = np.random.rand(Nv, Nl)
P = np.random.rand(Nx, Ny)
B = np.random.rand(Nk, Nl)
I1 = np.zeros([Nu, Nv])
I2 = np.zeros([Nu, Nv])
t = time.time()
for iu in range(Nu):
for iv in range(Nv):
for ix in range(Nx):
for iy in range(Ny):
S = 0.
for ik in range(Nk):
for il in range(Nl):
S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
I1[iu, iv] += S
I2[iu, iv] += S**2.
print time.time() - t; t = time.time()
# 0.0787379741669
I1_ = np.einsum('uk, vl, xk, yl, xy, kl->uv', Fu, Fv, Fx, Fy, P, B)
print time.time() - t
# 0.00049090385437
print np.allclose(I1_, I1)
# True
# Solution by expanding the square (not ideal)
t = time.time()
I2_ = np.einsum('uk,vl,xk,yl,um,vn,xm,yn,kl,mn,xy->uv', Fu,Fv,Fx,Fy,Fu,Fv,Fx,Fy,B,B,P**2)
print time.time() - t
# 0.0226809978485 <- faster than for loop but still much slower than I1_ einsum
print np.allclose(I2_, I2)
# True
As shown I've managed to do I1_ with I've figured out how to do the above with einsum
for I1
.
EDIT:
I added how to do I2_
by expanding the square but the speed up is a bit disappointing and to be expected... ~x3.47 speedup compared to ~x160
EDIT2:
The speedups don't seem to be consistent, I had gotten before a x40 and an x1.2 but now get different numbers. Either way the difference and the question remain.
EDIT3: I tried to simplify the sum I'm actually after but messed up and the sum above allows for the excellent answer provided by @user5402.
I've edited the code above to demonstrate the sum which is below:
I1 = S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl}
I2 = S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl} )^2