2

I am trying to get a clue on how to do broadcast a systematic dot product operation on a 10x10x10 3D grid. I formulated the following arrays:

A.shape=(5,5,10,10,10)
b.shape=(5,10,10,10)

I want to obtain an array like the following

c.shape=(5,10,10,10)

Which so far I obtained through the following code

c=np.sum(A*b,axis=1)

I believe, however, that I should be able to obtain the same result with np.dot or np.tensordot. I tried my hardest but I cannot manage to obtain an equivalent result. Doing so would help me a lot to understand how np.tensordot works, as I will also need to use np.tensorsolve in my work further on.

tk421
  • 5,775
  • 6
  • 23
  • 34
Michele
  • 2,148
  • 1
  • 9
  • 14
  • `tensordot` effectively does an outer product on the dimensions that aren't summed, producing in your case `(5, 10, 10, 10, 10, 10, 10).`. The desired values are embedded in there as some sort of diagonal, but it's calculating a lot more values than you need. With some axis swapping `mutmul` could perform the desired calc, since it's outer dimensions `go-along-for-the-ride`. But `einsum` is easiest to use. – hpaulj May 06 '18 at 15:54
  • If your question has been answered, consider accepting one of those. More info - https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235. – Divakar May 08 '18 at 06:46

2 Answers2

3

We need to keep the last four axes aligned and have those in the output as well except the second axis (axis=1), which is to be sum-reduced. For such a case np.einsum is the way to go, apart from np.matmul. With np.einsum, it would be easier/intuitive like so -

c = np.einsum('ijklm,jklm->iklm',A,b)
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thank you very much for your answer! I guess then that np.tensordot is not fit for such a problem? What if I reformulate the arrays as follows? `A.shape=(10,10,10,5,5)` `b.shape=(10,10,10,5)` `c.shape=(10,10,10,5)` – Michele May 06 '18 at 15:25
  • @Michele Would depend on how would get `c` in that case. Simply stating output shapes doesn't give us the entire picture. Also, to see how tensordot works, follow this - https://stackoverflow.com/questions/41870228/understanding-tensordot – Divakar May 06 '18 at 15:27
  • The idea is to perform a dot product A*b (with A=(5x5), b=(5x1)) for every coordinate of a 3D system. Both A and b vary among the coordinates. – Michele May 06 '18 at 15:33
  • @Michele So, I guess : `np.einsum('ijklm,ijkm->ijkl',A,b)` for that case. But, `tensordot` won't work because of the axes alignment requirement again. – Divakar May 06 '18 at 15:35
1

If the 'dot' dimensions are at the end, matmul will work

Comparing 3 methods:

In [252]: A=np.arange(5*5*10*10*10).reshape((5,5,10,10,10))
In [253]: B=np.arange(5*10*10*10).reshape((5,10,10,10))

In [254]: C=np.sum(A*B, axis=1)
In [255]: D=np.einsum('ijklm,jklm->iklm',A,B)
In [256]: E = (A.transpose(2,3,4,0,1)@B.transpose(1,2,3,0)[...,None])[...,0].transpose(3,0,1,2)

All the transposes make the arrays into into (....,5,5) and (...,5,1).

In [257]: np.allclose(C,D)
Out[257]: True
In [258]: np.allclose(C,E)
Out[258]: True

In [259]: timeit C=np.sum(A*B, axis=1)
124 µs ± 4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [260]: timeit D=np.einsum('ijklm,jklm->iklm',A,B)
66 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [261]: timeit E = (A.transpose(2,3,4,0,1)@B.transpose(1,2,3,0)[...,None])[...
     ...: ,0].transpose(3,0,1,2)
68.6 µs ± 973 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

tensordot reshapes and transposes the arrays so it can do a simple 2d dot - and then converts back. The last 3 dimensions are effectively a single 1000 dimension.

In [262]: np.tensordot(A,B,(1,0)).shape
Out[262]: (5, 10, 10, 10, 10, 10, 10)
In [263]: timeit np.tensordot(A,B,(1,0)).shape
74 ms ± 70.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The result is much bigger - a kind of outer product on the non-summing dimensions. The result is in there, buried as a multidimansional diagonal.

tensordot is effectively:

In [264]: X=(B.reshape(5,1000).T).dot(A.reshape(5,5,1000))
In [265]: X.shape
Out[265]: (1000, 5, 1000)
In [266]: timeit X=(B.reshape(5,1000).T).dot(A.reshape(5,5,1000))
78.9 ms ± 82.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you so much for your answer, the comparison is very useful to assess the speed of those methods! However, as you mentioned at the beginning, I've chosen to reformulate my problem moving the "dot" dimensions to the end of the arrays, which makes everything much easier. – Michele May 15 '18 at 04:05