I'm using numpy (ideally Numba) to perform a tensor contraction that involves three tensors, one of which is a vector that should multiply only one index of the others. For example,
A = np.random.normal(size=(20,20,20,20))
B = np.random.normal(size=(20,20,20,20))
v = np.sqrt(np.arange(20))
# e.g. v on the 3rd index
>>> %timeit np.vdot(A * v[None, None, :, None], B)
125 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
compare with
C = np.random.normal(size=(20,20,20,20))
>>> %timeit np.vdot(A * C, B)
76.8 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Is there a more efficient way of including the product with v
? It feels wrong that it should be slower than multiplying by the full tensor C
.