In [99]: X = np.array([1,5,2,4,5,25,10,20,5,25,10,20]).reshape(3,2,2)
In [100]: X
Out[100]:
array([[[ 1, 5],
[ 2, 4]],
[[ 5, 25],
[10, 20]],
[[ 5, 25],
[10, 20]]])
In [101]: prod=np.identity(2)
In [102]: for t in X:
...: prod=np.dot(prod,t)
...:
In [103]: prod
Out[103]:
array([[1525., 3875.],
[1550., 3850.]])
With the mutmul
operator:
In [104]: X[0]@X[1]@X[2]
Out[104]:
array([[1525, 3875],
[1550, 3850]])
In [105]:
A chained dot function:
In [106]: np.linalg.multi_dot(X)
Out[106]:
array([[1525, 3875],
[1550, 3850]])
Check its docs and its code to see how it tries to optimize the calculation.
How is numpy multi_dot slower than numpy.dot?
numpy large matrix multiplication optimize
Multiply together list of matrices in Numpy
If I expand X
several times:
In [147]: X.shape
Out[147]: (48, 2, 2)
The cost of one dot product:
In [148]: timeit X[0]@X[1]
5.08 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And for 48 of them:
In [149]: 5*48
Out[149]: 240
Compare that to the cost of iteratively evaluating chained dot:
In [150]: %%timeit
...: Y = np.eye(X.shape[1]).astype(X.dtype)
...: for i in X:
...: Y = Y@i
...:
227 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So the chaining does not cost any thing beyond just evaluating all the required dots.
I tried a recursive function that does nested dots, and don't get any improvements.
Unless you can, mathematically, come up with an alternative to evaluating the dots sequentially I doubt if there's room for improvement. Using something like cython
to directly call the underlying BLAS
functions might be the only alternative. It won't improve on individual dots, but could reduce some of the calling overhead.