Following @ajcr's great answer, I wanted to determine which method is the fastest, so I used timeit
:
import timeit
setup_code = """
import numpy as np
i,j,k = (300,200,400)
ind = np.ones((i,j,k)) #shape=(3L, 2L, 4L)
dist = np.random.rand(i,j) #shape=(3L, 2L)
"""
basic ="np.array([np.dot(dist[l],ind[l]) for l in xrange(dist.shape[0])])"
einsum = "np.einsum('ijk,ij->ik', ind, dist)"
tensor= "np.tensordot(ind, dist, axes=[1, 1])[0].T"
print "tensor - total time:", min(timeit.repeat(stmt=tensor,setup=setup_code,number=10,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=10,repeat=3))
print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=10,repeat=3))
The surprising results were:
tensor - total time: 6.59519493952
basic - total time: 0.159871203461
einsum - total time: 0.263569731028
So obviously using tensordot was the wrong way to do it (not to mention memory error
in bigger examples, just as @ajcr stated).
Since this example was small, I changed the matrices size to be i,j,k = (3000,200,400)
, flipped the order just to be sure it has not effect and set up another test with higher numbers of repetitions:
print "einsum - total time:", min(timeit.repeat(stmt=einsum,setup=setup_code,number=50,repeat=3))
print "basic - total time:", min(timeit.repeat(stmt=basic,setup=setup_code,number=50,repeat=3))
The results were consistent with the first run:
einsum - total time: 13.3184077671
basic - total time: 8.44810031351
However, testing another type of size growth - i,j,k = (30000,20,40)
led the following results:
einsum - total time: 0.325594117768
basic - total time: 0.926416766397
See the comments for explanations for these results.
The moral is, when looking for the fastest solution for a specific problem, try to generate data that is as similar to the original data as possible, in term of types and shapes. In my case i
is much smaller than j,k
and so I stayed with the ugly version, which is also the fastest in this case.