What's the difference between conv2d(SAME) and tf.pad + conv2d(VALID)?

Question

I'm almost new to tensorflow, and when I learn tensorflow through some tutorials, i've read the following codes:

if stride == 1:
    return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, padding='SAME', scope=scope)
else:
    pad_total = kernel_size - 1
    pad_beg = pad_total // 2
    pad_end = pad_total - pad_beg
    inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
    return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride, padding='VALID', scope=scope)

However, i also learn that, "SAME" padding means the output data has the same size with the input data, while "VALID" means different, and the the method of tf.pad also pad zero manually, so is there any difference between these two methods? Or what's the purpose of this tf.pad?

Possible duplicate of [What is the difference between 'SAME' and 'VALID' padding in tf.nn.max\_pool of tensorflow?](https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t) — Patwie, Jul 16 '18 at 07:25
Not a duplicate, since it specifically asks whether the manually padding the image is different, whereas the "duplicate question" only considers the differences in padding for this image. As far as I read from the definitions, you should be correct in your question, i.e. the two should produce the same results. the `pad` method still has its validity, since you could essentially pad with *any arbitrary value*, whereas `SAME` will use zeros every time. — dennlinger, Jul 16 '18 at 07:27
Note that manual padding gives you slightly more control: In case of even filter size there needs to uneven padding, and the conv functions with "same" padding will (IIRC) always pad one more at the _end_ (right/bottom) whereas you might want to have it the other way around for some reason. — xdurch0, Jul 16 '18 at 07:31

score 0 · Answer 1 · answered Oct 03 '18 at 15:03

In many real-word use-cases, there is no difference.

For instance, in some imagenet architectures, we often pad with 1, then do a 3x3 convolution. Behaviour of the network would be the same if you first zero-pad with 1, then convolve, or if you convolve with "same" padding.

However, behaviour will be different in non-standard situations. Remember that you can define kernel size AND stride AND dilation rate at a convolution layer.

A counterexample where there is a difference between conv2d(SAM) and a symmetric tf.pad +conv2d(VALID):

Input: (7,7,1) Kernel: (4,4) Stride: (2,2)

conv2d(SAME) here would be the same as tf.pad(0 pixel left/top, 1 pixel right/bottom), and would yield a (3,3,1) output.

What's the difference between conv2d(SAME) and tf.pad + conv2d(VALID)?

1 Answers1