4

For a 2D matrix X (shape (m,n)), I'm trying to calculate X.T * X where * is matrix multiplication. Following the explanation on this post I expected to be able to do this using, np.einsum('ji,ik->jk', X, X) where on the LHS writing ji first takes the transpose of the first X argument, then multiplies it by the second X argument.

This doesn't work with the error (for (m,n) = (3,4)):

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,3)->(4,newaxis,3) (4,3)->(3,4)

This works however: np.einsum('ij,jk->ik', X.T, X). What am I missing here? Why is it even adding an axis in the middle anyway?

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119

1 Answers1

7

With X.T * X (* being matrix-multiplication), you are sum-reducing the second axis of first X's transpose against the first axis of second X. Now, second axis of first X's transpose would be same as the first axis of first X. Thus, we are simply sum-reducing the first axis from those two X's, while the remaining axes from them stay on.

To replicate that on einsum, keep the first character in the string-notation same, while different ones for the second axis of the two inputs, like so -

np.einsum('ji,jk->ik', X, X)

Thus, j's are sum-reduced, while the remaining axes - i and k remain in the output.

Again, this would be slower than the native matrix-multiplication : X.T.dot(X). But, I am guessing this post is meant more as a learning one for einsum.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I'm still a little confused. In matrix-multiplication, `A x B`, we want each *row* of `A` multiplied by each *column* of `B`, right? So `'ji,jk->ik'` makes it seem like we're taking a *row* of the transpose, and multiplying by a *row* of the matrix... no? – DilithiumMatrix Sep 22 '17 at 14:57
  • Nevermind... just confused. This makes sense, thanks @Divakar! – DilithiumMatrix Sep 22 '17 at 15:02