I have numpy compiled with OpenBlas and I am wondering why einsum is much slower than dot (I understand in the 3 indices case, but I dont understand why it is also less performant in the two indices case)? Here an example:
import numpy as np
A = np.random.random([1000,1000])
B = np.random.random([1000,1000])
%timeit np.dot(A,B)
Out: 10 loops, best of 3: 26.3 ms per loop
%timeit np.einsum("ij,jk",A,B)
Out: 5 loops, best of 3: 477 ms per loop
Is there a way to let einsum use OpenBlas and parallelization like numpy.dot? Why does np.einsum not just call np.dot if it notices a dot product?