Additional information on numpy.einsum()

Question

I am trying to understand numpy.einsum() function but the documentation as well as this answer from stackoverflow still leave me with some questions.

Let's take the einstein sum and the matrices defined in the answer.

A = np.array([0, 1, 2])

B = np.array([[ 0,  1,  2,  3],
                  [ 4,  5,  6,  7],
                  [ 8,  9, 10, 11]])

np.einsum('i,ij->i', A, B)

So, based on my understanding of einstein sum, I would translate this function to be equivalent to the notation (A_i*B_ij) so I would obtain :

j = 1 : A_1*B_11 + A_2*B_21 + A_3*B_31

j = 2 : A_1*B_12 + A_2*B_22+ A_3*B_32

and so on until j = 4. This gives

j = 1 : 0 + 4 + 16

j = 2 : 0 + 5 + 18

which would be the einstein summation according to my understanding. Instead, the function does not perform the overall sum but stores the separate term in a matrix where we can spot the results of the (A_i * B_ij)

0   0   0   0
4   5   6   7
16  18  20  22

How is this actually controlled by the function ? I feel this is controlled by the output labels as mentionned in the documentation :

The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired

so somehow I assume that putting ->i disables summing of the inner sums. But how does this work exactly ? This is not clear for me. Putting ->j provides the actual einstein sum as expected.

bnaecker · Accepted Answer · 2017-11-18T17:11:35.977

It seems your understanding of the Einstein summation is not correct. The subscript operations you've written out have the multiplication correct, but the summation is over the wrong axis.

Think about what this means: np.einsum('i,ij->i', A, B).

A has shape (i,) and B has shape (i, j).
Multiply every column of B by A.
Sum over the second axis of B, i.e., over the axis labeled j.

This gives an output of shape (i,) == (3,), while your subscript notation gives an output of shape (j,) == (4,). You're summing over the wrong axis.

More detail:

Remember that the multiplication always happens first. The left-hand subscripts tell the np.einsum function which rows/columns/etc of the input arrays are to be multiplied with one another. The output of this step always has the same shape as the highest-dimensional input array. I.e., at this point, a hypothetical "intermediate" array has shape (3, 4) == B.shape.

After multiplication, there is summation. This is controlled by which subscripts are omitted from the right-hand side. In this case, j is omitted, which means to sum along the first axis of the array. (You're summing along the zeroth.)

If you instead wrote: np.einsum('i,ij->ij', A, B), there would be no summation, as no subscripts are omitted. Thus you'd get the array you've got at the end of your question.

Here are a couple of examples:

Ex 1:

No omitted subscripts, so no summation. Just multiply columns of B by A. This is the last array you've written out.

>>> (np.einsum('i,ij->ij', A, B) == (A[:, None] * B)).all()
True

Ex 2:

Same as the example. Multiply columns, then sum across the output's columns.

>>> (np.einsum('i,ij->i', A, B) == (A[:, None] * B).sum(axis=-1)).all()
True

Ex 3:

The sum as you've written it above. Multiply columns, then sum across the output's rows.

>>> (np.einsum('i,ij->j', A, B) == (A[:, None] * B).sum(axis=0)).all()
True

Ex 4:

Note that we can omit all axes at the end, to just get the total sum across the whole array.

>>> np.einsum('i,ij->', A, B)
98

Ex 5:

Note that the summation really happens because we repeated the input label 'i'. If we instead use different labels for each axis of the input arrays, we can compute things similar to Kronecker products:

>>> np.einsum('i,jk', A, B).shape
(3, 3, 4)

EDIT

The NumPy implementation of the Einstein sum differs a bit from the traditional definition. Technically, the Einstein sum doesn't have the idea of "output labels". Those are always implied by the repeated input labels.

From the docs: "Whenever a label is repeated, it is summed." So, traditionally, we'd write something like np.einsum('i,ij', A, B). This is equivalent to np.einsum('i,ij->j', A, B). The i is repeated, so it is summed, leaving only the axis labeled j. You can think about the sum in which we specify no output labels as being the same as specifying only the labels that are not repeated in the input. That is, the label 'i,ij' is the same as 'i,ij->j'.

The output labels are an extension or augmentation implemented in NumPy, which allow the caller to force summation or to enforce no summation on an axis. From the docs: "The output can be controlled by specifying output subscript labels as well. This specifies the label order, and allows summing to be disallowed or forced when desired."

In fact, the issue I have is with this statement "Remember that the multiplication always happens first" which is also found in the other stackoverflow answer. Where in the documentation is this properly explained ? "The left-hand subscripts tell the np.einsum function which rows/columns/etc of the input arrays are to be multiplied with one another" where is this explained in the documentation ? — FenryrMKIII, Nov 18 '17 at 16:33
Saying that "multiplication always happens first" is not really in the docs, it's from the definition of the notation. See the Wikipedia page for more info (https://en.wikipedia.org/wiki/Einstein_notation#Statement_of_convention), and my edits for more details on the how the docs explain the idea of output subscripts. — bnaecker, Nov 18 '17 at 17:07
I understand Einstein's notation. Used it a lot in fluid mechanics. As you say in your edit, it seems here the implementation deviates from the traditional Einstein notation. As you say, Einstein notation would be 'ij,i' alone and then you know that you sum over i while varying j. So what does "i,ij->i" means regarding traditional Einstein notation ? What does it do ? It suppresses the summation over i ? — FenryrMKIII, Nov 18 '17 at 17:16
`'i,ij->i'`, forces the summation to be over the axis labeled `'j'`, after the multiplication. So this operation is not, strictly speaking, correct Einstein notation, in my understanding. The reason for allowing this alternate notation is probably flexibility, if I had to guess. It allows expressing not just inner/outer/matrix products, but more general operations, including matrix trace, element accumulation, tiling operations, and things similar to Kronecker products. — bnaecker, Nov 18 '17 at 19:35
I have lore or less wrapped my head around einsum(). In the end, I take it the following way : einsum() is not properly speaking the einstein summation as summation is not controlled by repeated indices but by the output. The einstein notation used in einsum() only dictates the indices on which one iterates and that are coupled i.e. they evolve together in an iteration loop and the indices that are free to evolve on their own i.e. they have their own iteration loop. — FenryrMKIII, Nov 20 '17 at 06:11
@FenryrMKIII Your summary is correct *if you use output indices*. If you do not, i.e., you write something like `'ij,j'`, then `einsum` follows traditional Einstein summation. — bnaecker, Nov 25 '17 at 17:16

Additional information on numpy.einsum()

1 Answers1