Why use same padding with max pooling?

Question

While going through the autoencoder tutorial in Keras blog, I saw that the author uses same padding in max pooling layers in Convolutional Autoencoder part, as shown below.

x = MaxPooling2D((2, 2), padding='same')(x)

Could someone explain the reason behind this? With max pooling, we want to reduce the height and width but why is same padding, which keeps height and width the same, used here?

In addition, the result of this code halves the dimensions by 2, so the same padding doesn't seem to work.

Possible duplicate of [What is the difference between 'SAME' and 'VALID' padding in tf.nn.max\_pool of tensorflow?](https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t) — Sergey Antopolskiy, Apr 18 '19 at 10:41

score 6 · Accepted Answer · answered Jan 22 '19 at 22:33

From https://keras.io/layers/convolutional/

"same" results in padding the input such that the output has the same length as the original input.

From https://keras.io/layers/pooling/

pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

So, first let's start by asking why use padding at all? In the convolutional kernel context it is important since we don't want to miss each pixel being at the "center" of the kernel. There could be important behavior at the edges/corners of the image that a kernel is looking for. So we pad around the edges for Conv2D and as a result it returns the same size output as the input.

However, in the case of the MaxPooling2D layer we are padding for similar reasons, but the stride size is affected by your choice of pooling size. Since your pooling size is 2, your image will be halved each time you go through a pooling layer.

input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

So in the case of your tutorial example; your image dimensions will go from 28->14->7->4 with each arrow representing the pooling layer.

Why use same padding with max pooling?

1 Answers1

Linked