4
  • I have three sets of matrices {A_i}, {B_i}, and {C_i} with n matrices in each set
  • The A_i are of dimension l x m, the B_i are of dimension m x o and the C_i are of dimension p x q
  • I would like to compute the following: enter image description here

Here is a concrete example for what I am after

A = np.arange(12).reshape(2,3,2)
B = np.arange(12,24).reshape(2,2,3)
C = np.arange(32).reshape(2,4,4)

result = np.zeros((12,12))
for i in range(2):
    result += np.kron(A[i,:,:] @ B[i,:,:], C[i,:,:])

How can I implement this more efficiently?

Many thanks for your help!

1 Answers1

3

As suggested, I had a look into numpy.einsum. This turned out to be quite nice. A solution is:

np.einsum('ijk,imn->jmkn', np.einsum('ijk,ikm->ijm', A, B), C).reshape(A.shape[1] * C.shape[1], B.shape[2] * C.shape[2])
  • The inner np.einsum() produces a 3d array of the products of the 2d "slices" of A and B
  • The outer np.einsum() mimics (after appropriate reshaping) the kronecker product of this 3d matrix and C and summation.

I found the following two posts very helpful:

  • 1
    For larger arrays it's faster to combine both `einsum` calls and `optimize` the contraction with `np.einsum('ijk,ikm,ino->jnmo', A, B, C, optimize='optimal').reshape(A.shape[1] * C.shape[1], -1)`. Using `float64` arrays speeds up by ~5x, `float32` by ~10x because `int` arrays aren't using parallel BLAS calls with `numpy`. Might be worth converting to `float`. – Michael Szczesny Sep 22 '22 at 21:25
  • Thanks. The speedup from this - for larger arrays - is incredible! – Sebastian Hohmann Sep 23 '22 at 08:21