0

I want to compute the following:

import numpy as np
n= 3
m = 2
x = np.random.randn(n,m)

#Method 1
y = np.zeros(m)
for i in range(m):
    y[i] = x[:,i] @ x[:,i]

#Method 2
y2 = np.diag(x.T @ x)

The first method has the problem that it uses a for loop, which can't be very effecient (I need to do this in pytorch on a GPU millions of times)

The second method computes the full matrix product, when I only need the diagonal entries, so that can't be very efficient either.

I'm wondering whether there exist any clever way of doing this?

Tue
  • 371
  • 1
  • 14
  • 2
    With `np.einsum` - https://stackoverflow.com/questions/48066890/ – Divakar May 08 '20 at 21:00
  • @Divakar. I'd argue that that's overkill, although it probably creates fewer temp arrays than the trivial solution I posted. – Mad Physicist May 08 '20 at 21:04
  • @MadPhysicist Why would it be overkill? – Divakar May 08 '20 at 21:07
  • @Divakar. Because it's not as obvious (to me), and I wasn't really thinking about it. I've close the question since your answer already covers all the reasonable solutions I could think of. – Mad Physicist May 08 '20 at 21:09
  • Well yeah without the need of temp arrays lends itself to an efficient solution as well with `einsum`. – Divakar May 08 '20 at 21:10
  • `(x.T[:,None,:]@(x.T[:,:,None]))[:,0,0]` may be a bit faster, though it looks a lot messier. `@` broadcasts on the first of 3 dimensions. – hpaulj May 08 '20 at 22:26

1 Answers1

2

Use a manually constructed sum-product. You want the sums of the squares of the individual columns:

y = (x * x).sum(axis=0)

As Divakar suggests, np.einsum will likely offer a less memory-intensive option, since it does not require the temporary array x * x:

y = np.einsum('ij,ij->j', x, x)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264