Product operator of n matrices without a loop

Question

I got a matrix of dimention (nx2x2) so suposing the matrix M of (3x2x2):

[[[ 1.  5.]
  [ 2.  4.]]

 [[ 5. 25.]
  [10. 20.]]

 [[ 5. 25.]
  [10. 20.]]]

I would like to do a product operator (dot product) of each 2x2 matrix. In other words I want to do the folowing:

where Mj is the same as M[j,:,:] in python. The easiest way to do this is with a for loop like this:

prod=np.identity(2)
for temp in M:
    prod=np.dot(prod,temp)

but I don't know if there is a vectorized way of doing this (I am sure that this is probably a repeated question but I can't figure out the right question to google)

Could you clarify why you need this to be vectorized? Are you just looking for a faster run time? — pythonweb, Aug 22 '18 at 20:28
actually yes, I will be using it for really big matrices and with a for loop it's taking a lot, so I was wondering if vectoricing it would be a bit better — Oscar Muñoz, Aug 22 '18 at 20:31
Well it can varied a lot but one of the worst cases is 10000x400x400 — Oscar Muñoz, Aug 22 '18 at 20:39
`np.dot` is optimized for pairs of 2d arrays. So a repeated application of `dot` like this might well be the fastest. Mathematically I don't think there's a way of expressing this as some sort of 'parallel' 3d operation. I think there is some sort of 'chained dot' function, but at best it chooses an optimal order in which to do the individual products. — hpaulj, Aug 22 '18 at 20:43

hpaulj · Answer 1 · 2018-08-22T21:37:10.207

In [99]: X = np.array([1,5,2,4,5,25,10,20,5,25,10,20]).reshape(3,2,2)
In [100]: X
Out[100]: 
array([[[ 1,  5],
        [ 2,  4]],

       [[ 5, 25],
        [10, 20]],

       [[ 5, 25],
        [10, 20]]])
In [101]: prod=np.identity(2)
In [102]: for t in X:
     ...:     prod=np.dot(prod,t)
     ...:     
In [103]: prod
Out[103]: 
array([[1525., 3875.],
       [1550., 3850.]])

With the mutmul operator:

In [104]: X[0]@X[1]@X[2]
Out[104]: 
array([[1525, 3875],
       [1550, 3850]])
In [105]:

A chained dot function:

In [106]: np.linalg.multi_dot(X)
Out[106]: 
array([[1525, 3875],
       [1550, 3850]])

Check its docs and its code to see how it tries to optimize the calculation.

How is numpy multi_dot slower than numpy.dot?

numpy large matrix multiplication optimize

Multiply together list of matrices in Numpy

If I expand X several times:

In [147]: X.shape
Out[147]: (48, 2, 2)

The cost of one dot product:

In [148]: timeit X[0]@X[1]
5.08 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And for 48 of them:

In [149]: 5*48
Out[149]: 240

Compare that to the cost of iteratively evaluating chained dot:

In [150]: %%timeit
     ...: Y = np.eye(X.shape[1]).astype(X.dtype)
     ...: for i in X:
     ...:     Y = Y@i
     ...:     
227 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the chaining does not cost any thing beyond just evaluating all the required dots.

I tried a recursive function that does nested dots, and don't get any improvements.

Unless you can, mathematically, come up with an alternative to evaluating the dots sequentially I doubt if there's room for improvement. Using something like cython to directly call the underlying BLAS functions might be the only alternative. It won't improve on individual dots, but could reduce some of the calling overhead.

`multi_dot` isn't going to provide any advantage in this case, though. It still calls `dot` repeatedly at Python level. It tries to pick an optimal order in which to do so, but all arrays involved have identical shape in this case, so there is no advantage to be gained. — user2357112, Aug 22 '18 at 20:58
Has numba been tested for this scenario? It seems to optimize loops efficiently — pythonweb, Aug 22 '18 at 21:08
`numba` is good if it can reduce the calculation to C calls. I'm don't how it handles calls to `dot`. `cython` with direct calls to the relevant `BLAS` routines might also be an option. In either case, this doesn't get around loops; it just shoves the iteration into compiled code. — hpaulj, Aug 22 '18 at 21:17
Right, but compiling the code will make it quicker, and so it may be what was being asked for, not really sure based on the question — pythonweb, Aug 22 '18 at 21:19

pythonweb · Accepted Answer · 2018-08-22T22:28:26.763

If you just want a speedup, here is a solution (I am getting a 10x speed up on this computer)

import numpy as np
import numba as nb

big_array = np.random.rand(400, 400, 400)

def numpy_method(array3d):
    return np.linalg.multi_dot(array3d)

@nb.jit(nopython=True)
def numba_method(array3d):
    prod = np.identity(array3d.shape[2])
    for i in nb.prange(array3d.shape[0]):
        temp = array3d[i, :, :]
        prod = np.dot(prod, temp)

    return prod

numpy_result = numpy_method(big_array)
numba_result = numba_method(big_array)

print(np.all(np.isclose(numpy_result, numba_result)))

%timeit numpy_method(big_array)
9.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numba_method(big_array)
923 ms ± 7.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This does not scale very well to smaller arrays, but when the array is as large as you were saying compiling the code with the LLVM compiler does make a noticeable difference.

Also, when I make the arrays bigger they all overload available memory in the floats and turn to inf values. I'm guessing that your matrices won't run into this issue though.

My `@` iteration, as in In[150] is 6x faster than `multi_dot` for your test case. Not as good as your `numba` but close. — hpaulj, Aug 23 '18 at 00:05
It doen't make much difference wheter you use numba or not on your numba_method(array3d) function if you have the same BLAS backend in numpy and scipy (which numba uses). There is not much to optimize for Numba, simply because >90% of runtime are spent in the BLAS backend which Numba and Numpy calls if you use np.dot() on floats. But there can be differences on int, because in this case there is no BLAS backend for int. — max9111, Aug 23 '18 at 09:47

Product operator of n matrices without a loop

2 Answers2