faster alternative to numpy einsum?

Question

I have two matrices - U and V with n vectors in each one as follows:

U = [u1, u2, u3,... un]
V = [v1, v2, v3,... vn]

(in fact U and V are 2d numpy arrays, the vectors u1..un, and v1...vn are 1d)

I want to create a 1d numpy array of n values:

W = [dot(u1, v1), dot(u2, v2), dot(u3, v3),..., dot(un, vn)]

(dot = dot product)

Currently I do this with numpy's einsum:

W = np.einsum('ij,ij->i', U, V)

However, my matrices are large and the calculation is repeated many times (with different U and V) so I try to find ways to make it faster.

I understand that there could be faster alternatives to np.einsum, like np.dot or np.tensordot or einsum2 that utilize parallelism (while np.einsum is sequential). But I did not manage to get the same functionality - with both np.dot an np.tensordot the output is a 2d matrix that contains the result of more dot products than I need.

I also started to look at NumExpr but didn't find a way to achieve what I need in an elegant way and without more dot products than required.

If someone knows of a way to improve on einsum's performance for my use-case I'd appreciate it.

The shape depends on data from the user, in my tests I use the shape: (20, 10000) — Assaf, Oct 06 '20 at 18:07
@MadPhysicist that's gonna be slower due to overhead for intermediate arrays. — Quang Hoang, Oct 06 '20 at 18:10
@QuangHoang. What surprised me is that if I use `np.multiply(U, V, out=U).sum(1)`, ~20% slower (same for `out=V`). — Mad Physicist, Oct 06 '20 at 18:13
@Divakar I'll check your solution and report back (Adding ravel() to flatten the result) — Assaf, Oct 06 '20 at 18:15
Does anyone know if tensorflow, numba, or whatnot would adequately fuse the `(U * V).sum(1)` answer above since OP is doing the same kind of computation to many inputs? — Hans Musgrave, Oct 06 '20 at 18:26
@Divakar your solution gave me an improvement of 14% on average (20 runs with each implementation) - which is very nice. please post your suggestion as an answer so I can accept it later. I'll wait a little more to see if there are more ideas. — Assaf, Oct 06 '20 at 18:36
If that works, we can close as a duplicate of - https://stackoverflow.com/questions/35090401/how-to-calculate-the-dot-product-of-two-arrays-of-vectors-in-python. Just added that one there. — Divakar, Oct 06 '20 at 18:43
Out of curiosity, does anyone know why np.matmul is faster than np.einsum? is it due to parallelism or something else? — Assaf, Oct 06 '20 at 21:16

faster alternative to numpy einsum?

0 Answers0

Linked