How most efficiently compute the diagonal of a matrix product

Question

I want to compute the following:

import numpy as np
n= 3
m = 2
x = np.random.randn(n,m)

#Method 1
y = np.zeros(m)
for i in range(m):
    y[i] = x[:,i] @ x[:,i]

#Method 2
y2 = np.diag(x.T @ x)

The first method has the problem that it uses a for loop, which can't be very effecient (I need to do this in pytorch on a GPU millions of times)

The second method computes the full matrix product, when I only need the diagonal entries, so that can't be very efficient either.

I'm wondering whether there exist any clever way of doing this?

With `np.einsum` - https://stackoverflow.com/questions/48066890/ — Divakar, May 08 '20 at 21:00
@Divakar. I'd argue that that's overkill, although it probably creates fewer temp arrays than the trivial solution I posted. — Mad Physicist, May 08 '20 at 21:04
@Divakar. Because it's not as obvious (to me), and I wasn't really thinking about it. I've close the question since your answer already covers all the reasonable solutions I could think of. — Mad Physicist, May 08 '20 at 21:09
Well yeah without the need of temp arrays lends itself to an efficient solution as well with `einsum`. — Divakar, May 08 '20 at 21:10
`(x.T[:,None,:]@(x.T[:,:,None]))[:,0,0]` may be a bit faster, though it looks a lot messier. `@` broadcasts on the first of 3 dimensions. — hpaulj, May 08 '20 at 22:26

score 2 · Answer 1 · answered May 08 '20 at 21:03

Use a manually constructed sum-product. You want the sums of the squares of the individual columns:

y = (x * x).sum(axis=0)

As Divakar suggests, np.einsum will likely offer a less memory-intensive option, since it does not require the temporary array x * x:

y = np.einsum('ij,ij->j', x, x)

How most efficiently compute the diagonal of a matrix product

1 Answers1