Achieving batch matrix multiply using tensordot

Question

I'm trying to achieve the same behaviour as np.matmul parallel matrix multiplication using just tensordot,dot and reshaping etc.

The library I am translating this to using does not have a matmul that supports parallel multiplication, only dot and tensordot.

Additionally I want to avoid iterating over the first dimension, and want to do this using a set of matrix multiplications and reshaping (want as much of it to run using BLAS/GPU as i have large numbers of small matrices to calculate in parallel).

Here is an example:

import numpy as np

angles = np.array([np.pi/4, 2*np.pi/4, 2*np.pi/4])

vectors = np.array([ [1,0],[1,-1],[-1,0]])

s = np.sin(angles)
c = np.cos(angles)

rotations = np.array([[c,s],[-s,c]]).T

print rotations

print vectors

print("Correct: %s" % np.matmul(rotations, vectors.reshape(3,2,1)))

# I want to do this using tensordot/reshaping, i.e just gemm BLAS operations underneath
print("Wrong: %s" % np.tensordot(rotations, vectors, axes=(1,1)))

The output of this is:

Correct: [[[  7.07106781e-01]
  [  7.07106781e-01]]

 [[  1.00000000e+00]
  [  1.00000000e+00]]

 [[ -6.12323400e-17]
  [ -1.00000000e+00]]]


Wrong: [[[  7.07106781e-01   1.11022302e-16  -7.07106781e-01]
  [ -7.07106781e-01  -1.41421356e+00   7.07106781e-01]]

 [[  6.12323400e-17  -1.00000000e+00  -6.12323400e-17]
  [ -1.00000000e+00  -1.00000000e+00   1.00000000e+00]]

 [[  6.12323400e-17  -1.00000000e+00  -6.12323400e-17]
  [ -1.00000000e+00  -1.00000000e+00   1.00000000e+00]]]

Is there a way in which I can modify the second expression in order to get the same result as the first, just using dot/tensordot.

I believe it is possible, and have seen some comments online, but never any examples

`tensordot` swaps and reshapes so the problem reduces to a `dot` (and then back). Some `matmul` operations can be achieved by taking a diagonal from a much larger 'outer' calculation. — hpaulj, Sep 18 '17 at 20:16
yes, I did notice that taking the diagonal, but thinking that would potentially be far less efficient then just looping — Chris Bamford, Sep 18 '17 at 21:23

Divakar · Answer 1 · 2017-09-18T17:44:47.900

4

We need to keep one aligned and keep that also at the output. So, tensordot/dot won't work here. More info on tensordot might explain it somehow on why it won't. But, we can use np.einsum, which in most cases (in my experience) is seen to be marginally faster than np.matmul.

The implementation would look something like this -

np.einsum('ijk,ik->ij',rotations, vectors)

Also, it seems the desired output has one trailing singleton dim. So, append a new axis there with None/np.newaxis, like so -

np.einsum('ijk,ik->ij',rotations, vectors)[...,None]

edited Sep 18 '17 at 17:44

answered Sep 18 '17 at 17:37

Divakar

218,885
19
262
358

"The library I am translating this to using does not have a matmul that supports parallel multiplication, only dot and tensordot. " - I cannot use einsum either. Any other ideas? – Chris Bamford Sep 18 '17 at 19:10
@ChrisBamford What library is it? Is it tensorflow? – Divakar Sep 18 '17 at 19:13
@ChrisBamford Nah, it's not possible with `dot/tensordot` in a vectorized manner. So, I would say just loop through. – Divakar Sep 18 '17 at 19:19

Achieving batch matrix multiply using tensordot

1 Answers1

Linked

Related