I need to divide a 2D matrix into a set of 2D patches with a certain stride, then multiply every patch by its center element and sum the elements of each patch.
It feels not unlike a convolution where a separate kernel is used for every element of the matrix.
Below is a visual illustration. The elements of the result matrix are calculated like this:
The result should look like this:
Here's a solution I came up with:
window_shape = (2, 2)
stride = 1
# Matrix
m = np.arange(1, 17).reshape((4, 4))
# Pad it once per axis to make sure the number of views
# equals the number of elements
m_padded = np.pad(m, (0, 1))
# This function divides the array into `windows`, from:
# https://stackoverflow.com/questions/45960192/using-numpy-as-strided-function-to-create-patches-tiles-rolling-or-sliding-w#45960193
w = window_nd(m_padded, window_shape, stride)
ww, wh, *_ = w.shape
w = w.reshape((ww * wh, 4)) # Two first dimensions multiplied is the number of rows
# Tile each center element for element-wise multiplication
m_tiled = np.tile(m.ravel(), (4, 1)).transpose()
result = (w * m_tiled).sum(axis = 1).reshape(m.shape)
In my view it's not very efficient as a few arrays are allocated in the intermediary steps.
What is a better or more efficient way to accomplish this?