1

I want to efficiently perform a chain of matrix-vector multiplication in Python and the numpy.einsum function seems to be the best choice. However, I do NOT know the number of matrix operands N in the chain a priori and so there is no simple way to write the subscripts string to pass to einsum.

To be more clear, let's consider the simple case with N=3 matrices A, B, and C and the initial vector v. The numpy.einsum call would be the following:

result = numpy.einsum('ij,jk,kl,l', A, B, C, v)

How can I generalize this operation to an arbitrary N? A possibility would be to programmatically create the subscripts string but numpy.einsum only supports 52 different labels (26 letters lower and upper case) and this represents a limitation in my case since I don't know a priori the number of operands.

Is there maybe a way to do this as efficiently as possible by using another approach based on a different numpy function?

  • 1
    Is it not good enough to do it by induction? or induction by blocks of size <= 52? – Learning is a mess Jun 28 '23 at 11:15
  • Mm... What do you mean exactly? – SimoneGasperini Jun 28 '23 at 11:36
  • https://stackoverflow.com/questions/37794245/can-i-use-more-than-26-letters-in-numpy-einsum – dankal444 Jun 28 '23 at 11:50
  • The `245` link explains why, with the original `einsum`, very many arguments and subscripts was not possible. `einsum` has since been reworked to use a sequence of evaluations as shown by the `einsum_path`. But the old `github` issue still shows a limit the number of arguments. – hpaulj Jun 28 '23 at 15:45

3 Answers3

2

Here are two recursive solutions, one based on "atomic induction" aka doing one einsum for each matrix multiplication:

A, B, C, D = (np.random.rand(10,10),) * 4

def recursive_matrix_multiply(*args):
    if len(args) == 1:
        return args[0]
    return recursive_matrix_multiply( np.einsum('ij,jk' if args[1].ndim == 2 else 'ij,j', *args[:2]), *args[2:])

np.testing.assert_almost_equal( recursive_matrix_multiply(A, B, C, D), np.einsum('ij,jk,kl,lm', A, B, C, D))
np.testing.assert_almost_equal( recursive_matrix_multiply(A, B, C, D[:,0]), np.einsum('ij,jk,kl,l', A, B, C, D[:,0]))

and here is a solution based on "block induction" doing an einsum per largest block doable at one. Note that I started with string.ascii_letters to use all 52 letters in the function vocab but I was running into errors, so I had to reduce it to 11, 15 did work but was slow.

from string import ascii_letters, ascii_lowercase
from itertools import pairwise
from functools import reduce

letters = ascii_lowercase[:11]

def build_multiply_einsten_string(length, ends_with_vector):
    ein_string = ','.join( ''.join(pair) for pair in pairwise(ascii_lowercase[:length+1]))
    return ein_string if not ends_with_vector else ein_string[:-1]

def recursive_matrix_multiply(*args):
    if len(args) == 1:
        return args[0]
    batch = args[:len(letters)-1]
    tail = args[len(letters)-1:]
    return recursive_matrix_multiply( np.einsum(build_multiply_einsten_string(len(batch), batch[-1].ndim == 1 ), *batch), *tail)

# testing ending on a matrix
args = [np.random.rand(5,5) for _ in range(60)]
np.testing.assert_allclose( recursive_matrix_multiply(*args), reduce(np.matmul, args))

# testing ending on a vector
args[-1] = args[-1][:,0]
np.testing.assert_allclose( recursive_matrix_multiply(*args), reduce(np.matmul, args))

Note that they catter for both scenarios:

  • pure 2d matrix multiplication
  • chain of dot products ending on a vector
Learning is a mess
  • 7,479
  • 7
  • 35
  • 71
2

I want to efficiently perform a chain of matrix-vector multiplication in Python and the numpy.einsum function seems to be the best choice.

But the best choice seems to be numpy.linalg.multi_dot, with some of the documents as follows:

Compute the dot product of two or more arrays in a single function call, while automatically selecting the fastest evaluation order.

multi_dot chains numpy.dot and uses optimal parenthesization of the matrices [1] [2]. Depending on the shapes of the matrices, this can speed up the multiplication a lot.

If the first argument is 1-D it is treated as a row vector. If the last argument is 1-D it is treated as a column vector. The other arguments must be 2-D.

Think of multi_dot as:

def multi_dot(arrays): return functools.reduce(np.dot, arrays)

This function solves the problem of unknown number of operations by simply placing your matrix in a list. At the same time, its performance looks better than that of einsum, with a relatively casual benchmark as follows:

In [_]: rng = np.random.default_rng()

In [_]: A = rng.random((10, 20)); B = rng.random((20, 15))

In [_]: C = rng.random((15, 30)); v = rng.random(30)

In [_]: np.allclose(np.linalg.multi_dot([A, B, C, v]),
   ...:             np.einsum('ij,jk,kl,l', A, B, C, v))
Out[_]: True

In [_]: %timeit np.einsum('ij,jk,kl,l', A, B, C, v)
401 µs ± 894 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [_]: %timeit np.linalg.multi_dot([A, B, C, v])
20.3 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Mechanic Pig
  • 6,756
  • 3
  • 10
  • 31
  • 1
    My answer does not mention speed. I have indeed found that the einsum solution I proposed to be very slow, and very poorly scalable. I was just trying to fit within the constraint of using einsum, and also in a way totally oblivious to optimizations that could help it. Having said that, if speed is a concern, I suggest using `reduce(np.matmul, ...)` as I have found it much faster than `multi_dot`. Feel free to benchmark one against the other. – Learning is a mess Jun 28 '23 at 14:35
  • Yes, `reduce(np.matmul, ...)` seems to be the fastest. Thank you for the comment! – SimoneGasperini Jun 28 '23 at 15:15
  • 2
    `multi_dot` only helps if the arrays differ in shape. Then it can choose an order that reduces the problem space the fastest. If the shapes are all the same or similar, evaluation order doesn't matter. `multi_dot` still ends up doing a sequence of `dot` calls. It doesn't do any higher-order `dot` in compiled code. – hpaulj Jun 28 '23 at 15:55
1

As mentioned in the comments, you can use the (undocumented?) interleaved mode to facilitate generating your subscripts. Something like this:

np.einsum(*[x for i, a in enumerate(mats) for x in [a, [i, i+1]]], v, [len(mats)]

This should work up to ~50 according to a bugfix from 2018, however it doesn't work for me above ~30.

As suggested by @learning-is-a-mess, you could also do it in blocks: split your list of matrices into N sized blocks, run each block through einsum with the previous block's output as its first input.

The best option, though, is probably just to use opt_einsum. This can significantly speed up your einsum, and also supports arbitrary numbers of indices:

a = np.array([[1]])
opt_einsum.contract(*[x for i in range(10000) for x in [a, [i, i+1]]])
David
  • 1,688
  • 1
  • 11
  • 21