I haven't used this new out
parameter before, but have worked with einsum
in the past, and have a general idea of how it works (or at least used to).
It looks to me like it initializes the out
array to zero before the start of iteration. That would account for all the 0s in the A[:,1:,:]
block. If instead I initial separate out
array, the desired values are inserted
In [471]: B = np.ones((3,4,3),int)
In [472]: np.einsum('j, ijk->ijk', P, A[:, 1:, :], out=B[:,1:,:])
Out[472]:
array([[[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
[[ 15, 16, 17],
[ 36, 38, 40],
[ 63, 66, 69]],
[[ 27, 28, 29],
[ 60, 62, 64],
[ 99, 102, 105]]])
In [473]: B
Out[473]:
array([[[ 1, 1, 1],
[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
[[ 1, 1, 1],
[ 15, 16, 17],
[ 36, 38, 40],
[ 63, 66, 69]],
[[ 1, 1, 1],
[ 27, 28, 29],
[ 60, 62, 64],
[ 99, 102, 105]]])
The Python portion of einsum
doesn't tell me much, except how it decides to pass the out
array to the c
portion, (as one of the list of tmp_operands
):
c_einsum(einsum_str, *tmp_operands, **einsum_kwargs)
I know that it sets up a c-api
equivalent of np.nditer
, using the str
to define the axes and iterations.
It iterates something like this section in the iteration tutorial:
https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.nditer.html#reduction-iteration
Notice in particular the it.reset()
step. That sets the out
buffer to 0 prior to iterating. It then iterates over the elements of input arrays and the output array, writing the calculation values to the output element. Since it is doing a sum of products (e.g. out[:] += ...
), it has to start with a clean slate.
I'm guessing a bit as to what is actually going on, but it seems logical to me that it should zero out the output buffer to start with. If that array is the same as one of the inputs, that will end up messing with the calculation.
So I don't think this approach will work and save you memory. It needs a clean buffer to accumulate the results in. Once that's done it, or you, can write the values back into A
. But given the nature of a dot
like product, you can't use the same array for input and for output.
In [476]: A[:,1:,:] = np.einsum('j, ijk->ijk', P, A[:, 1:, :])
In [477]: A
Out[477]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 12, 14, 16],
[ 27, 30, 33]],
....)