How to avoid using a for loop using either tensors or einsum?

Question

I have the following problem at hand. F is a NumPy array of dimensions 2 X 100 X 65. I want to generate another array V whose dimensions are 2 X 2 X 65. This array V must be computed in the following way:

For each t, V[:, :, t] = F[:, :, t] @ F[:, :, t].T where .T means a matrix transpose and @ means the usual matrix multiplication. Right now, I am using the following approach:

aux_matrix = np.matmul(np.transpose(F, (2, 0 ,1)), np.transpose(F, (2, 1, 0)))
V = np.transpose(aux_matrix, (1, 2, 0))

I understand that np.tensordot and np.einsum can help with this kind of situation and make things both faster and more elegant. However, I am new to tensors and I am not used to the Einstein notation. Can someone provide some light on how to perform this computation and maybe link a reference guiding a beginner here? Thanks!

https://numpy.org/doc/1.19/reference/generated/numpy.einsum.html — wwii, Dec 27 '20 at 21:03
I read that but I still can't solve my problem @wwii. I don't know if np.eisum can be really used here or if I need the tensor product. — Raul Guarini Riva, Dec 27 '20 at 21:32
I thought the question was `link a reference guiding a beginner here?` — wwii, Dec 27 '20 at 21:37
Thanks for providing that link. I am trying to put my problem into one of those examples but I am failing to do so. I think that this is a case of "Batch Multiplication" but I am not sure. — Raul Guarini Riva, Dec 27 '20 at 21:39
Nothing wrong with your `matmul` use. It probably will be faster than `einsum`. If the batch dimension was first it would simpler. In numpy leading dimensions are outer most, and most suitible batch use. — hpaulj, Dec 27 '20 at 21:48
For einsum try 'ijk,njk->ink'. k is the shared batch, j is the sum-of-products dim. — hpaulj, Dec 27 '20 at 21:55
@hpaulj indeed, after I realized how to do it with einsum, I found out that using einsum is actually 10x slower after a quick profiling through the cell magic ```%%timeit```. Any reasons why? Thanks for your comment. — Raul Guarini Riva, Dec 27 '20 at 21:56
@hpaulj Thanks once more! Yes, I am using this formulation. However, I am confused. I had the impression that whenever einsum sees an index being repeated in the LHS, it would perform an element by element multiplication. However, it's not doing that. What am I missing? — Raul Guarini Riva, Dec 27 '20 at 21:58
All dimensions are involved in a kind of elementwise mul, but sone also summed. — hpaulj, Dec 27 '20 at 22:09

score 1 · Accepted Answer · answered Dec 28 '20 at 04:11

As the solution in the comment says, the einsum equivalent of the solution would be,

np.einsum("ijk,njk->ink", F, F)

Following the rules of einsum, axis=1 and axis=0 (the axes corresponding to the labels j and k) of both the arrays are going to get element-wise multiplied. Out of this, j is missing from the final output and k is present. Which means that the axis corresponding to j will get added up in the final solution and k will stop at element-wise multiplication.

In general, if the labels are repeated in the notation, they will be element-wise multiplied, and if the label is missing from the final output, addition will take place on top of element-wise addition.

This is exactly what's happening here. F has shape (2, 100, 65). "ijk,njk->ink" will do the following in this case -

Elements of axis=0 of the first array will get operated on with every element of axis=0 of the second array. This is what i,n->in in the string will represent. Here, i=2 and n=2 and the final matrix is to have the first two dimensions given by in, hence the first two dimensions of the final array will have shape (2, 2).
The axis corresponding to j are repeated in both the arrays. Hence they will be multiplied element-wise. However, this is missing from the final output, so this result will get summed over along that direction. That is, the dimension corresponding to 100 will be missing from the final output.
Similar thing will happen to the label corresponding to k as well but that is still present in the final output, so sum-reduction will not take place. Hence the final output will have the axis corresponding to 65.

If you check the shape of the final output, it's (2, 2, 65), as expected.

I hope this clears the doubt that OP expressed in the comment.

However, it's not correct to assume that this is automatically superior to the matmul formation in terms for performance. In terms of readability, maybe. But the actual performance depends on the sizes and relative dimensions of the array, and a lot of other factors, probably.

It's worth checking if the performance changes if you add the optimize=True key into einsum since I have seen this making a massive amount of difference in performance in some situations. However, for the sizes of this array, that seems to have made things slightly worse (which might be explained by the time it takes for einsum to figure out a good way of optimising the array, which might not be justified given the relative small sizes of the array).

My rule of thumb is, if you are able to figure out a solution with matmul without using any additional for loops, stick with it if your primary concern is performance. On the other hand, if your program has a bunch of for loops, give einsum a try, with and without optimize=True. However, even in that case, there are some instances where a solution with a native for loop outperforms einsum depending on the relative dimensions of the array.

How to avoid using a for loop using either tensors or einsum?

1 Answers1