2

I have this piece of code:

other = np.random.rand((m,n,o))
prev = np.random.rand((m,n,o,m,n,o))
mu = np.zeros((m,n,o,m,n,o))
for c in range(m):
   for i in range(n):
      for j in range(o):
         mu[c,i,j,c,i,j] = other[c,i,j]*prev[c,i,j,c,i,j]

And I'd like to simplify it using einsum notation (possibly saving time by skipping the for loops in python). However after a few tries I'm eventually not sure how to approach the problem. My current try is:

np.einsum('cijklm,cij->cijklm', prev, other)

It does not achieves the same result as the "for-loop" piece of code.

  • 2
    `np.einsum('cijcij,cij->cij', prev, other)` produces the same non-zero values. Your `mu` just has them distributed on 3d 'diagonals'. I learned this by testing your code, but I think you really should include the error traceback for your `einsum`. Don't just say it "does not work". – hpaulj Feb 04 '22 at 23:45
  • There's no error traceback to say so, but I'll add more information thanks ! – Sebastienwood Feb 04 '22 at 23:52
  • Oops, I was testing a case where I repeated the `cij` - see my answer. – hpaulj Feb 04 '22 at 23:59

2 Answers2

2

With shapes (2,3,4), I get:

In [52]: mu.shape
Out[52]: (2, 3, 4, 2, 3, 4)

This einsum expression complains that dimensions are repeated in the output:

In [53]: np.einsum('cijcij,cij->cijcij', prev, other).shape
Traceback (most recent call last):
  File "<ipython-input-53-92862a0865a2>", line 1, in <module>
    np.einsum('cijcij,cij->cijcij', prev, other).shape
  File "<__array_function__ internals>", line 180, in einsum
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/einsumfunc.py", line 1359, in einsum
    return c_einsum(*operands, **kwargs)
ValueError: einstein sum subscripts string includes output subscript 'c' multiple times

Without the repeat:

In [55]: x=np.einsum('cijcij,cij->cij', prev, other)
In [56]: x.shape
Out[56]: (2, 3, 4)

Nonzero values match:

In [57]: np.allclose(mu[np.nonzero(mu)].ravel(), x.ravel())
Out[57]: True

Or by extracting the diagonals from mu:

In [59]: I,J,K = np.ix_(np.arange(2),np.arange(3),np.arange(4))
In [60]: mu[I,J,K,I,J,K].shape
Out[60]: (2, 3, 4)
In [61]: np.allclose(mu[I,J,K,I,J,K],x)
Out[61]: True

Your einsum satisfies the same 'diagonals' test:

In [68]: y=np.einsum('cijklm,cij->cijklm', prev, other)
In [69]: y.shape
Out[69]: (2, 3, 4, 2, 3, 4)
In [70]: np.allclose(y[I,J,K,I,J,K],x)
Out[70]: True

So the mu values are also present in y, but distributed in a different way. But the arrays are too big to readily view and compare.

OK, each y[i,j,k] is the same, and equal to x. In mu most of these values are 0, with only selected diagonals being nonzero.

While einsum can generate the same nonzero values, it cannot distribute them in the same 3d diagonals way as your loop.

Changing your mu calculation to produce a 3d array:

In [76]: nu = np.zeros((m,n,o))
    ...: for c in range(m):
    ...:    for i in range(n):
    ...:       for j in range(o):
    ...:          nu[c,i,j] = other[c,i,j]*prev[c,i,j,c,i,j]
    ...: 
In [77]: np.allclose(nu,x)
Out[77]: True

edit

We can assign einsum result to the diagonals with:

In [134]: out = np.zeros((2,3,4,2,3,4))
In [135]: out[I,J,K,I,J,K] = x
In [136]: np.allclose(out, mu)
Out[136]: True

Conceptually it may be simpler than the as_strided solution. And may be just as fast. as_strided, while making a view, is not as fast as a reshape kind of view.

In [143]: %%timeit
     ...: out = np.zeros((m, n, o, m, n, o))
     ...: mu_view = np.lib.stride_tricks.as_strided(out,
     ...:                      shape=(m, n, o),
     ...:                      strides=[sum(mu.strides[i::3]) for i in range(3)
     ...: ]
     ...:                      )
     ...: np.einsum('cijcij,cij->cij', prev, other, out=mu_view)
     ...: 
     ...: 
31.6 µs ± 69.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [144]: %%timeit
     ...: out = np.zeros((2,3,4,2,3,4))
     ...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
     ...: 
     ...: 
18.5 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

or including the I,J,K generation in the time loop

In [146]: %%timeit
     ...: I,J,K = np.ix_(np.arange(2),np.arange(3),np.arange(4))
     ...: out = np.zeros((2,3,4,2,3,4))
     ...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
40.4 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

or creating the IJK directly:

In [151]: %%timeit
     ...: I,J,K = np.arange(2)[:,None,None],np.arange(3)[:,None],np.arange(4)
     ...: out = np.zeros((2,3,4,2,3,4))
     ...: out[I,J,K,I,J,K] =np.einsum('cijcij,cij->cij', prev, other)
25.1 µs ± 38.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks for the answer ! I however need to keep the same exact results as the original code while speeding it up (our m=16, n=32, o=32P: that's lengthy !) – Sebastienwood Feb 05 '22 at 19:04
  • 1
    I added a assign-to-diagonals step using the `I,J,K` arrays. – hpaulj Feb 05 '22 at 20:47
1

It is not possible to get this result using np.einsum() alone, but you can try this:

import numpy as np
from numpy.lib.stride_tricks import as_strided

m, n, o = 2, 3, 5
np.random.seed(0)
other = np.random.rand(m, n, o)
prev = np.random.rand(m, n, o, m, n, o)
mu = np.zeros((m, n, o, m, n, o))

mu_view = as_strided(mu,
                     shape=(m, n, o),
                     strides=[sum(mu.strides[i::3]) for i in range(3)]
                     )
np.einsum('cijcij,cij->cij', prev, other, out=mu_view)

The array mu should be then the same as the one produced by the code using nested loops in the question.

Some explanation. Regardless of a shape of a numpy array, internally its elements are stored in a contiguous block of memory. Part of the structure of an array are strides, which specify how many bytes one needs to jump when one of the indices of an array element is incremented by 1. Thus, in a 2-dimensional array arr, arr.stride[0] is the number of bytes separating an element arr[i, j] from arr[i+1, j] and arr.stride[1] is the number of bytes separating arr[i, j] from a[i, j+1]. Using the strides information numpy can find a given element in an array based on its indices. See e.g. this post for more details.

numpy.lib.stride_tricks.as_strided is a function that creates a view of a given array with custom-made strides. By specifying strides, one can change which array element corresponds to which indices. In the code above this is used to create mu_view, which is a view of mu with the property, that the element mu_view[c, i, j] is the element mu[c, i, j, c, i, j]. This is done by specifying strides of mu_view in terms of strides of mu. For example, the distance between mu_view[c, i, j] and mu_view[c+1, i, j] is set to be the distance between mu[c, i, j, c, i, j] and mu[c+1, i, j, c+1, i, j], which is mu.strides[0] + mu.strides[3].

bb1
  • 7,174
  • 2
  • 8
  • 23
  • Something like this occurred to me as well, though I didn't try to implement it. It may be too advanced for this user. – hpaulj Feb 05 '22 at 17:07
  • Thanks it works perfectly. As @hpaulj pointed, I'm not familiar with strides though (but now it's a good opportunity to check it further). Could you maybe point out why and how it works in this case ? and if by any chance you knew how to replicate in Pytorch, it seems they don't support the out parameter for an einsum (though it's easy to reproduce the strided view). – Sebastienwood Feb 05 '22 at 19:07
  • 1
    @Sebastienwood I added some explanation to the answer. The code in the answer would work with the last line replaced by `mu_view[:] = np.einsum('cijcij,cij->cij', prev, other)`. I guess with such a change it may be adapted to Pytorch. – bb1 Feb 06 '22 at 02:28