I have two matrices - U and V with n vectors in each one as follows:
U = [u1, u2, u3,... un]
V = [v1, v2, v3,... vn]
(in fact U and V are 2d numpy arrays, the vectors u1..un, and v1...vn are 1d)
I want to create a 1d numpy array of n values:
W = [dot(u1, v1), dot(u2, v2), dot(u3, v3),..., dot(un, vn)]
(dot = dot product)
Currently I do this with numpy's einsum:
W = np.einsum('ij,ij->i', U, V)
However, my matrices are large and the calculation is repeated many times (with different U and V) so I try to find ways to make it faster.
I understand that there could be faster alternatives to np.einsum, like np.dot or np.tensordot or einsum2 that utilize parallelism (while np.einsum is sequential). But I did not manage to get the same functionality - with both np.dot an np.tensordot the output is a 2d matrix that contains the result of more dot products than I need.
I also started to look at NumExpr but didn't find a way to achieve what I need in an elegant way and without more dot products than required.
If someone knows of a way to improve on einsum's performance for my use-case I'd appreciate it.