1

I'm using numpy (ideally Numba) to perform a tensor contraction that involves three tensors, one of which is a vector that should multiply only one index of the others. For example,

A = np.random.normal(size=(20,20,20,20))
B = np.random.normal(size=(20,20,20,20))
v = np.sqrt(np.arange(20))

# e.g. v on the 3rd index
>>> %timeit np.vdot(A * v[None, None, :, None], B)
125 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

compare with

C = np.random.normal(size=(20,20,20,20))

>>> %timeit np.vdot(A * C, B)
76.8 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Is there a more efficient way of including the product with v? It feels wrong that it should be slower than multiplying by the full tensor C.

Ziofil
  • 1,815
  • 1
  • 20
  • 30
  • `np.einsum('ijkl,k,ijkl', A, v, B, optimize='optimal').item()`, I don't know, why the result is a single item array with optimization. – Michael Szczesny Aug 17 '22 at 19:24
  • I should have said I wanted to avoid `einsum` because it's not supported in numba. Is it possible to do the same as what einsum is doing, but without einsum? – Ziofil Aug 17 '22 at 19:28
  • Maybe why first case is slower than second can be understood from this post https://stackoverflow.com/questions/48253210/broadcasted-numpy-arithmetic-why-is-one-method-so-much-more-performant –  Aug 18 '22 at 00:26

1 Answers1

3

I could squeeze some performance by using numba with parallel=True

import numba as nb
import numpy as np

N = 50

@nb.njit('float64(float64[:,:,:,:], float64[:,:,:,:],float64[:])',parallel=True)
def dotABv(a, b,vv):
    res = 0.0
    for i in nb.prange(a.shape[0]):
        for j in range(a.shape[1]):
            for k in range(a.shape[2]):
                res += vv[k]*np.dot(a[i,j,k,:],b[i,j,k,:])
    return res

v = np.sqrt(np.arange(N))
A = np.random.normal(size=(N,N,N,N))
B = np.random.normal(size=(N,N,N,N))
C = np.random.normal(size=(N,N,N,N))

%timeit dotABv(A,B,v)
%timeit np.vdot(A , B) ## just to compare with dot
%timeit np.vdot(A * v[None, None, :, None], B)
# Output :
# 501 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 1.1 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 14.7 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

print(dotABv(A,B,v), np.vdot(A * v[None, None, :, None], B))
# 5105.504508154087 5105.5045081541075