I have Ubuntu 14.04 with an "Anaconda" Python distribution with Intel's math kernel library (MKL) installed. My processor is an Intel Xeon with 8 cores and without Hyperthreading (so only 8 threads).
For me numpy tensordot
consistently outperforms einsum
for large arrays. However, others have found very little difference between the two or even that einsum may outperform numpy for some operations.
For people with a numpy
distribution built with a fast library, I am wondering why this might happen. Does MKL run more slowly on non-Intel processors? Or does einsum
run faster on more modern Intel processors with better threading capabilities?
Here is a quick example code to compare performance on my machine:
In [27]: a = rand(100,1000,2000)
In [28]: b = rand(50,1000,2000)
In [29]: time cten = tensordot(a, b, axes=[(1,2),(1,2)])
CPU times: user 7.85 s, sys: 29.4 ms, total: 7.88 s
Wall time: 1.08 s
In [30]: "FLOPS TENSORDOT: {}.".format(cten.size * 1000 * 2000 / 1.08)
Out [30]: 'FLOPS TENSORDOT: 9259259259.26.'
In [31]: time cein = einsum('ijk,ljk->il', a, b)
CPU times: user 42.3 s, sys: 7.58 ms, total: 42.3 s
Wall time: 42.4 s
In [32]: "FLOPS EINSUM: {}.".format(cein.size * 1000 * 2000 / 42.4)
Out [32]: 'FLOPS EINSUM: 235849056.604.'
Tensor operations with tensordot run consistently in the 5-20 GFLOP range. I only get 0.2 GFLOPS with einsum.