Vectorizing matrix multiplications in NumPy

Question

Let x be an unevenly spaced strictly increasing array (e.g., x = np.array([0, .1, .3, .7, .99])). I have the following spline functions:

from scipy.interpolate import InterpolatedUnivariateSpline
a_ = InterpolatedUnivariateSpline(x, a)
b_ = InterpolatedUnivariateSpline(x, b)
c_ = InterpolatedUnivariateSpline(x, c)
d_ = InterpolatedUnivariateSpline(x, d)
e_ = InterpolatedUnivariateSpline(x, e)
f_ = InterpolatedUnivariateSpline(x, f)
g_ = InterpolatedUnivariateSpline(x, g)
w_ = InterpolatedUnivariateSpline(x, w)

which are built from numpy arrays a,b,c,d,e,f,g,w of the same length as x.

Let w be an array with shape (2,len(x)). I am now doing the following loop:

y1 = []
y2 = []
for ii in range(len(x)): 
    xi = x[ii]
    
    A_1 = np.array([[  a_(xi), -a_(xi)], 
                    [       0, -b_(xi)]])
    
    B_1 = np.array([[ -a_(xi),      0], 
                    [  b_(xi),      0]])
    
    C_1 = np.array([[       1,      0], 
                    [  c_(xi),      0]])
    
    D_1 = np.array([[       1,      0], 
                    [ -d_(xi),      1]])
    
    D_2 = np.array([[      -1,      0], 
                    [       0,  e(xi)]])
    
    Amat = A_1 + B_1 @ np.linalg.inv(D_1) @ C_1
    Bmat = B_1 @ np.linalg.inv(D_1) @ D_2 
    Cmat = np.linalg.inv(D_1) @ C_1
    Dmat = np.linalg.inv(D_1) @ D_2

    K1 = np.array([f_(xi), g_(xi)])

    y1 += [-Amat.T@w  - Cmat.T@K1]
    y2 += [ Dmat.T@K1 + Bmat.T@w ]

This works, however, it is very slow, as x is large and I need to do this loop many times.

My feeling is that I should be able to vectorize all of this code and it will be very fast. However, when I try to do so, I run into problems with the @ matrix multiplications:

MemoryError: Unable to allocate array with shape (4800, 4800, 4800) and data type float64

Here is my attempt, with the code that produces that error:

nil = np.zeros(len(x))
A_1 = np.array([[ a_(x), -a_(x)], 
                [   nil, -b_(x)]])

B_1 = np.array([[-a_(x),    nil], 
                [ b_(x),    nil]])

C_1 = np.array([[ 1+nil,    nil], 
                [ c_(x),    nil]])

D_1 = np.array([[ 1+nil,    nil], 
                [-d_(x),  1+nil]])

D_2 = np.array([[ nil-1,    nil], 
                [   nil,  e_(x)]])

Amat = A_1 + B_1 @ np.linalg.pinv(D_1) @ C_1
Bmat = B_1 @ np.linalg.pinv(D_1) @ D_2
Cmat = np.linalg.pinv(D_1) @ C_1
Dmat = np.linalg.pinv(D_1) @ D_2

K1 = np.array([f_(x), g_(x)])

Cmat.T@K1 # produces the error

At a basic level I can move the creation of the A_1, etc matrices outside of the loop, and that speeds it up. However, is it possible to speed it up even more?

since you do matrix inversion, I don't think that using "large" matrix will lead to a faster code.. what is the time difference without `Cmat.T@K1` line (and without equivalent code in the first approach)? — Jan Stránský, Aug 18 '20 at 13:49
What's the shape of the arrays? Is the shape in the error expected? — hpaulj, Aug 18 '20 at 14:07
Can you add a full working example? It looks like there is something wrong: `y1 += [-Amat.T@w - Cmat.T@K1]`. The interpolation of the a_,.. can be easily vectorized. There are for sure large speedups possible (you have loads of temporary arrays, calling BLAS routines with tiny arrays https://stackoverflow.com/a/49613797/4045774 ,etc. With a fully working example (also includes all inputs, which could be random) it should be straight forward to drastically speed this up using cython or numba without any additional memory overhead. — max9111, Aug 18 '20 at 15:40
@max, full working examples involving memory errors are a pain to work with. Different size memories, long run times etc. — hpaulj, Aug 18 '20 at 17:39
@hpaulj the shape in the error is not expected. The arrays are (2,2,4800), where 4800 is the length of x. In the loop example, each iteration (over the 4800) builds these little 2x2 matrices. — rhombidodecahedron, Aug 18 '20 at 19:36
`matmul/@` treats the first (of 3) dimensions as the 'batch', doing a 'dot' on the last two. As with `np.dot`, the basic rule is "the last dimension of A is paired with the 2nd to last of B" - the classic across rows, down columns of matrix multiplication. Sounds like you want to work with (4800,2,2) shaped arrays. — hpaulj, Aug 18 '20 at 19:41
@hpaulj Thanks - that gives some progress; by doing `A_1 = A_1.reshape(-1,2,2)` I can apparently do `Cmat@K1` (notice however the absense of tranpose from Cmat now). However, the resulting shape is (4800,2,4800) and not (2,4800) as I am expecting it to be. Do you have any advice? — rhombidodecahedron, Aug 18 '20 at 20:31
@hpaulj I still run into the same basic issue, that I cannot multiply these (4800,2,2) arrays in the same way that I do in the loop to get a (2,4800) array — rhombidodecahedron, Aug 18 '20 at 22:49
Don't you want (4800,2)? 4800 is your outer, batch dimension. — hpaulj, Aug 18 '20 at 23:37
@hpaulj Yes, sure, either way - as long as it is not three dimensional! — rhombidodecahedron, Aug 19 '20 at 07:30

Vectorizing matrix multiplications in NumPy

0 Answers0