Higher dimension matrix multiplication

Question

Suppose I have:

A, a matrix of shape (m, 3). A_i is the i^th row of this matrix.
n matrices S₁, ..., S_n, each one of shape (3, 3)
n vectors v₁, ..., v_n, each one of shape (3, )

I want to calculate the matrix M of shape (m ,n), defined by:

M_ij = (A_i - v_j^T) S_j ((A_i)^T - v_j)

How can I do it efficiently using NumPy, without iterating with Python loops over i and j?

David · Accepted Answer · 2023-04-27T11:40:32.870

Assuming you actually have m vectors, you can stack all your vectors using np.stack, producing a (m, 3) matrix. You can then subtract that from A directly (literally using -) and transpose it for the last part (with ndarray.swapaxes). Call this X. S can also be stacked in a similar way, producing a (n, 3, 3) matrix. You can then use broadcasting to multiply them (by adding dimensions to X and S):

V = np.stack(vs)
S = np.stack(Ss)

X = A - V
X = X[:, np.newaxis, np.newaxis, :]
S = S[np.newaxis, :, :, :]
M = X @ S @ X.swapaxes(2, 3)
M = M.squeeze((2, 3))

Edit: You can also do it using np.einsum, but that may be a bit more complex depending on your background:

M = np.einsum('ik,jkl,il->ij', X, S, X)

Edit 2: After reading the accepted answer, I realise I misunderstood the question. Rather than simply deleting my answer, here's the correct answer implemented using np.einsum:

def using_einsum(A, S, V):
    X = A[np.newaxis] - V[:, np.newaxis]
    return np.einsum('jik,jkl,jil->ij', X, S, X)

Thank you for the answer, unfortunately m and n can be different — Cider, Apr 26 '23 at 20:31
I wasn't suggesting that m and n had to be equal, just that you probably meant m vectors, not n. But I see that may be wrong. Fun problem anyway! — David, Apr 27 '23 at 01:03

ftorre · Answer 2 · 2023-04-27T18:00:12.193

It is possible to implement a solution only using numpy. First, I assume that your vectors v and matrices S are gathered into numpy arrays, respectively with shapes (n, 3) and (n, 3, 3).

If not, it is possible to concatenate vectors and matrices using np.concatenate:

S = np.concatenate([s.reshape(1, d, d) for s in [s_1, s_2, ..., s_n]], axis=0)
V = np.concatenate([v.reshape(1, -1) for v in [v_1, v_2, ..., v_n]], axis=0)

I also assume that A is stored into a numpy array of shape (m, 3).

The naive implementation (using for loops) is the following:

def naive(A, S, V):
    assert S.shape[0] == V.shape[0]
    M = np.zeros((A.shape[0], S.shape[0]))

    for i in range(A.shape[0]):
        for j in range(V.shape[0]):
            diff = A[i] - V[j]
            M[i, j] = np.dot(
                np.dot(
                    diff.reshape(1, -1),
                    S[j],
                ),
                diff.reshape(-1, 1),
            ).reshape(-1)

    return M

And the optimized one, only using numpy:

def only_numpy(A, S, V):
    diff = A.reshape(m, 1, 3) - V.reshape(1, n, 3)
    return np.matmul(
        np.matmul(
            diff.reshape(m, n, 1, 3),
            S.reshape(1, n, 3, 3),
        ),
        diff.reshape(m, n, 3, 1),
    ).reshape(m, n)

We factorize the substraction using numpy broadcasting. We also use matmul to broadcast matrix multiplication (see this post).

We can verify that both function return the same result with:

assert np.allclose(naive(A, S, V), only_numpy(A, S, V))

Concerning the optimisation, I ran some experiments. Assuming: d=3, and n=2*m. I have (naive, only_numpy and einsum columns report elapsed time in seconds, using this post or memory using memory-profiler with default resolution):

% time in seconds
       n      m   naive (t)  only_numpy (t)  einsum (t)
0    100    100    0.185373        0.000584    0.000701
1   1000   1000   18.964981        0.044783    0.054029
2  10000  10000         N/A        4.228531    4.098396

% memory in MiB
       n      m  naive (m)  only_numpy (m)  einsum (m)
0    100    100       63.6            63.6        63.6
1   1000   1000      101.9           109.8       102.0
2  10000  10000        N/A          4689.3      3918.7

The einsum implementation is from David (see the other answer).

It works, thank you! Can it possibly run faster using np.einsum as mentioned in @David's answer? — Cider, Apr 26 '23 at 20:33
Your question interested me, so I compared this with a corrected version of mine. `np.einsum` tends to be a little faster when n and m are large (>=2000 on my computer). I can't go too high because of memory constraints, but testing at 10k x 10k it's takes 40% less time. Probably not worth it. However, if you run into memory constraints, `np.einsum` seems to use about 40% less memory. If you're running it repeatedly, you may be able to improve this with `np.einsum_path`. — David, Apr 27 '23 at 02:15
Can you please explain how to implement this code using np.einsum? — Cider, Apr 27 '23 at 06:01
I've updated my answer. Not sure if I should delete the rest of it, but anyway. — David, Apr 27 '23 at 11:42

Higher dimension matrix multiplication

2 Answers2