2

TL;DR: How can I modify my code given below to incorporate the padding = 'same' method?

I was trying to build my own CNN using numpy and got confused due to the two answers for padding = 'same'.

This Answer says that

padding='Same' in Keras means padding is added as required to make up for overlaps when the input size and kernel size do not perfectly fit

So according to this, same means the Minumum padding required in each direction. If that's the case, shouldn't this be equally on both sides? Or if the minimum required padding was 2, shouldn't that be a valid candidate for padding to be distributed equally on all of the 4 sides. What if required padding was just 3? What happens then?

Also, what bothers me is the official documentation of tensorflow where they say:

"same" results in padding with zeros evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.

So what is the right answer?

Here is the code that I have written for padding

def add_padding(X:np.ndarray, pad_size:Union[int,list,tuple], pad_val:int=0)->np.ndarray:
    '''
    Pad the input image array equally from all sides
    args:
        x: Input Image should be in the form of [Batch, Width, Height, Channels]
        pad_size: How much padding should be done. If int, equal padding will done. Else specify how much to pad each side (height_pad,width_pad) OR (y_pad, x_pad)
        pad_val: What should be the value to be padded. Usually it os 0 padding
    return:
        Padded Numpy array Image
    '''
    assert (len(X.shape) == 4), "Input image should be form of [Batch, Width, Height, Channels]"
    if isinstance(pad_size,int):
        y_pad = x_pad = pad_size
    else:
        y_pad = pad_size[0]
        x_pad = pad_size[1]

    pad_width = ((0,0), (y_pad,y_pad), (x_pad,x_pad), (0,0)) # Do not pad first and last axis. Pad Width(2nd), Height(3rd) axis with  pad_size
    return np.pad(X, pad_width = pad_width, mode = 'constant', constant_values = (pad_val,pad_val))


# Another part of my Layer
# New Height/Width is dependent on the old height/ width, stride, filter size, and amount of padding
h_new = int((h_old + (2 * padding_size) - filter_size) / self.stride) + 1
w_new = int((w_old + (2 * padding_size) - filter_size) / self.stride) + 1

Full Code for this layer is presented here

Deshwal
  • 3,436
  • 4
  • 35
  • 94
  • Actually both of them are true. ``padding=same`` specifies padding such that output shape is equal to input shape if stride is 1. But if you specify stride something else, you will get different result. – Kaveh Jun 18 '21 at 13:06
  • I don’t know what the problem is. The documentation is perfectly clear: padding with zeros so that the output has the same size as the input. It doesn’t matter how much you pad, the “minimally” is redundant. If you pad more, you’re just wasting space. The output size is given. – Cris Luengo Jun 18 '21 at 13:49

1 Answers1

7

According to this SO answer, the name 'SAME' padding just came from the property that when stride equals 1, output spatial shape is the same as input spatial shape.

However, that is not the case when stride doesn't equal one. The output spatial shape is determined by the following formula.

For all cases, the definition of 'SAME' means to apply the padding in a tensorflow way such that

For each spatial dimension i,
output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

So what is the tensorflow way to apply the padding?

First, the paddings needed for each of the spatial dimensions are determined by the following algorithm.

#e.g. for 2D image, num_spatial_dim=2
def get_padding_needed(input_spatial_shape,filter_shape,strides):
  num_spatial_dim=len(input_spatial_shape)
  padding_needed=[0]*num_spatial_dim

  for i in range(num_spatial_dim):
    if input_spatial_shape[i] % strides[i] == 0:
      padding_needed[i] = max(filter_shape[i]-strides[i],0)
    else:
      padding_needed[i] = max(filter_shape[i]-(input_spatial_shape[i]%strides[i]),0)

  return padding_needed

#example
print(get_padding_needed(input_spatial_shape=[2000,125],filter_shape=[8,4],strides=[4,1]))
#[4,3]

As you can see, the padding needed for the first spatial dimension is a even number 4. That's simple, just pad 2 zeros at each end of the first spatial dimension.

Second, the padding needed for the second dimension is an odd number. Then, tensorflow will pad fewer zeros at the starting end.

In other words, if the dimension is height and padding needed is 3, it will pad 1 zero at the top and 2 zeros at the bottom. If the dimension is width, and padding needed is 5, it will pad 2 zeros at the left and 3 zeros at the right ,etc.

References:

  1. https://www.tensorflow.org/api_docs/python/tf/nn/convolution
  2. https://mmuratarat.github.io/2019-01-17/implementing-padding-schemes-of-tensorflow-in-python
Laplace Ricky
  • 1,540
  • 8
  • 7
  • For reference #1, currently this has more details https://www.tensorflow.org/api_docs/python/tf/nn#notes_on_padding_2 – Joe Jan 21 '22 at 19:58