11

when i learn the deep mnist with the tensorflow tutorial, i have a problem about the output size after convolving and pooling to the input image. In tutorials we can see:

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])

We then convolve x_image with the weight tensor, add the bias, apply 
the ReLU function, and finally max pool. The max_pool_2x2 method 
will reduce the image size to 14x14.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

I think there are two steps to handle the input image: fisrt convolution and second max pool?! After convolution, the output size is (28-5+1)*(28-5+1) = 24*24. Then the size of input to max pooling is 24*24. if the pool size is 2*2, the output size is (24/2)*(24/2) = 12*12 rather than 14*14. Does that make sense? pleae tell me the detail about how to calculate the output size after convolution and pooling. Thanks a lot. The following image is the process of the CNN in a paper. image of the CNN process

I have already understood where the problem is.

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

The padding = 'SAME' means the output size is same with the input size----image size. Then after convolution ,the output size is 28*28, and the finally output size is (28/2)*(28/2) = 14*14 after pooling. But how to explain the following code about the padding = 'SAME':

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                    strides=[1, 2, 2, 1], padding='SAME')
WangYang
  • 466
  • 1
  • 5
  • 15

4 Answers4

14

Lets take an example.

Tensor size or shape: (width = 28, height = 28)

Convolution filter size (F): (F_width = 5, F_height = 5)

Padding (P): 0

Padding algorithm: VALID (this means that the output size can vary)

Stride (S): 1

Using the equation:

output width=((W-F+2*P )/S)+1

output width= ((28-5+2*0)/1) + 1

output width = 24

The same answer will be valid for the output height considering that they have the same dimension.

So the output dimension will be (24,24).

However, if the padding algorithm is set to "same", the size of the output is equal to the size of the original input.

Let also remember that a pooling is a form of "filter" and thus the above filer equation is a aplicable.

So a 2x2 pooling with stride of 2, using the same equation (((W-F+2*P )/S)+1) will give us:

= ((28-2+2*0)/2) + 1 = (26/2)+1 = (13)+1 = 14

Here is a link to the answer I once posted to Quora.

https://www.quora.com/How-can-I-calculate-the-size-of-output-of-convolutional-layer/answer/Rockson-Agyeman

rocksyne
  • 1,264
  • 15
  • 17
4

The output size of a convolutional layer depends on the padding algorithm used. As you can see in the "Convolution and Pooling" section, in the tutorial, they use the same method of padding. That means that the output shape is the same as the input shape and the input is padded with zeros outside the original input.

Your estimate for the output shape is true when you use the valid padding algorithm.

dseuss
  • 971
  • 6
  • 8
  • I have understood that. Thanks a lot. I should read the tensorflow API more carefully. – WangYang May 26 '17 at 04:12
  • please tell me where i can find detailed description of the tensorflow API if you konw that. I am a freshman about the tensorflow. Thanks again. – WangYang May 26 '17 at 04:19
  • The standard place to look is of course the official [API documentation](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d). A good explanation of the different convolutions can be found in the [Theano documention](http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html) – dseuss May 26 '17 at 05:34
  • get it!! it is so simple question! Sorry for troubling you – WangYang May 26 '17 at 07:00
1

If you are using tensorflow, you can find more detailed discussion here: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?

Vaibhav Dixit
  • 844
  • 1
  • 9
  • 12
0

assume your image is in size n*n and your conv filter is in size f*f

if you use valid convolution (it means you have zero padding) the output size is (n-f+1)*(n-f+1)

if you want the output and input to have the same dimension you should use the same conv and the size of padding is: p = (f-1)/2 (we usually use odd size of f)

if you want a general formula, if your input is in size n*n and your convolution kernel size is in f*f, padding size p and stride size s the output will be : (((n+2p-f)/s)+1)*(((n+2p-f)/s)+1)

the first parenthesis is height and the second is width

Soroush Karimi
  • 343
  • 4
  • 10