How to implement maxpool: taking a maximum on sliding window on image or tensor

Question

In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions.

In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here). Writing y = MaxPool(x, S), x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by:

     y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.

That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x, the window "corner" is placed at the indexes b,h,w,c.

Some additional details: The network is implemented using numpy. CNN has many "layers" where output of one layer is the input to the next layer. The input to a layers are numpy.narrays called "tensors". In my case tensors are 4-dimensional numpy.narray's, x. That is x.shape is a tuple (B,H,W,C). Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3, while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5. As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C).

For a concreteness: if I use

import numpy as np

y = np.amax(x, axis = (1,2))

where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3, the size of the second and third dimension of x, which is not exactly what I want.

So, the output would be of shape : `(B,H-S+1,W-S+1,C)`, right? — Divakar, Jan 26 '17 at 20:31
@Divakar Yes, you are right, the output depends on how many times you can fit in the sliding window — them, Jan 26 '17 at 20:32
[`scipy.ndimage.maximum_filter1d`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.maximum_filter1d.html) implements an O(n) algo (independent of window size) — user7138814, Jan 27 '17 at 12:27

Divakar · Accepted Answer · 2017-01-26T20:42:08.537

Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C). The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. The subsequent operation of max being a reduction would efficiently utilize the sliding views.

Thus, an implementation would be -

# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
    a, X, Y, b = img.shape
    x, y = patch_shape
    shape = (a, X - x + 1, Y - y + 1, x, y, b)
    a_str, X_str, Y_str, b_str = img.strides
    strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
    return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)

out = patchify(x, (S,S)).max(axis=(3,4))

Sample run -

In [224]: x = np.random.randint(0,9,(10,24,24,3))

In [225]: S = 5

In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True

In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)

In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)

for non-overlapping patches, i.e, for `(a, ((X-x)/x)+1, (Y-y/y)+1, x, y, b)` or simply `(a, X/x, Y/y, x, y, b)`, what strides do I use? Is there a simple way to correlate conv stride (1,2,3.., window_size or full strides) with the strides used by numpy as_strided? — Saravanabalagi Ramachandran, Nov 24 '17 at 00:57
@SaravanabalagiRamachandran Might be simpler to use scikit-image's view_as_blocks I would think. — Divakar, Nov 24 '17 at 06:30

How to implement maxpool: taking a maximum on sliding window on image or tensor

1 Answers1