Multiply high order matrices with numpy

Question

I created this toy problem that reflects my much bigger problem:

import numpy as np

ind = np.ones((3,2,4)) # shape=(3L, 2L, 4L)
dist = np.array([[0.1,0.3],[1,2],[0,1]]) # shape=(3L, 2L)

ans = np.array([np.dot(dist[i],ind[i]) for i in xrange(dist.shape[0])]) # shape=(3L, 4L)
print ans

""" prints:
   [[ 0.4  0.4  0.4  0.4]
    [ 3.   3.   3.   3. ]
    [ 1.   1.   1.   1. ]]
"""

I want to do it as fast as possible, so using numpy's functions to calculate ans should be the best approach, since this operation is heavy and my matrices are quite big.

I saw this post, but the shapes are different and I cannot understand which axes I should use for this problem. However, I'm certain that tensordot should have the answer. Any suggestions?

EDIT: I accepted @ajcr's answer, but please read my own answer as well, it may help others...

great (EDIT: I accepted @ajcr's answer, but please read my own answer as well.) it will help to others.. — Imran Ali Khan, Jun 15 '15 at 16:43

score 16 · Answer 1 · edited May 23 '17 at 12:07

Following @ajcr's great answer, I wanted to determine which method is the fastest, so I used timeit :

import timeit

setup_code = """
import numpy as np
i,j,k = (300,200,400)
ind = np.ones((i,j,k)) #shape=(3L, 2L, 4L)
dist = np.random.rand(i,j) #shape=(3L, 2L)
"""

basic ="np.array([np.dot(dist[l],ind[l]) for l in xrange(dist.shape[0])])"
einsum = "np.einsum('ijk,ij->ik', ind, dist)"
tensor= "np.tensordot(ind, dist, axes=[1, 1])[0].T"

print "tensor - total time:", min(timeit.repeat(stmt=tensor,setup=setup_code,number=10,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=10,repeat=3))
print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=10,repeat=3))

The surprising results were:

tensor - total time: 6.59519493952
basic - total time: 0.159871203461
einsum - total time: 0.263569731028

So obviously using tensordot was the wrong way to do it (not to mention memory error in bigger examples, just as @ajcr stated).

Since this example was small, I changed the matrices size to be i,j,k = (3000,200,400), flipped the order just to be sure it has not effect and set up another test with higher numbers of repetitions:

print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=50,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=50,repeat=3))

The results were consistent with the first run:

einsum - total time: 13.3184077671
basic - total time: 8.44810031351

However, testing another type of size growth - i,j,k = (30000,20,40) led the following results:

einsum - total time: 0.325594117768
basic - total time: 0.926416766397

See the comments for explanations for these results.

The moral is, when looking for the fastest solution for a specific problem, try to generate data that is as similar to the original data as possible, in term of types and shapes. In my case i is much smaller than j,k and so I stayed with the ugly version, which is also the fastest in this case.

Interesting to see that `einsum` isn't necessarily quicker than using `dot` in a loop! I think that if the first dimension is large, the `for` loop will drag `dot` down quite a bit and `einsum` might be the faster option. I tried with `i,j,k = (300000,20,40)` and got `dot` as 1.11 s compared to `einsum` at 273 ms. As your answer shows, it's definitely worth testing the methods on the kind of shapes you're working with so you know what's fastest. — Alex Riley, Jun 13 '15 at 14:10

Alex Riley · Accepted Answer · 2015-06-13T16:26:10.820

You could use np.einsum to do the operation since it allows for very careful control over which axes are multiplied and which are summed:

>>> np.einsum('ijk,ij->ik', ind, dist)
array([[ 0.4,  0.4,  0.4,  0.4],
       [ 3. ,  3. ,  3. ,  3. ],
       [ 1. ,  1. ,  1. ,  1. ]])

The function multiplies the entries in the first axis of ind with the entries in the first axis of dist (subscript 'i'). Ditto for the second axis of each array (subscript 'j'). Instead of returning a 3D array, we tell einsum to sum the values along axis 'j' by omitting it from the output subscripts, thereby returning a 2D array.

np.tensordot is more difficult to apply to this problem. It automatically sums the products of axes. However, we want two sets of products but to sum only one of them.

Writing np.tensordot(ind, dist, axes=[1, 1]) (as in the answer you linked to) computes the correct values for you, but returns a 3D array with shape (3, 4, 3). If you can afford the memory cost of a larger array, you could use:

np.tensordot(ind, dist, axes=[1, 1])[0].T

This gives you the correct result, but because tensordot creates a much larger-than-necessary array first, einsum seems to be a better option.

Multiply high order matrices with numpy

2 Answers2