when i learn the deep mnist with the tensorflow tutorial, i have a problem about the output size after convolving and pooling to the input image. In tutorials we can see:
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
We then convolve x_image with the weight tensor, add the bias, apply
the ReLU function, and finally max pool. The max_pool_2x2 method
will reduce the image size to 14x14.
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
I think there are two steps to handle the input image: fisrt convolution and second max pool?! After convolution, the output size is (28-5+1)*(28-5+1) = 24*24. Then the size of input to max pooling is 24*24. if the pool size is 2*2, the output size is (24/2)*(24/2) = 12*12 rather than 14*14. Does that make sense? pleae tell me the detail about how to calculate the output size after convolution and pooling. Thanks a lot. The following image is the process of the CNN in a paper. image of the CNN process
I have already understood where the problem is.
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
The padding = 'SAME' means the output size is same with the input size----image size. Then after convolution ,the output size is 28*28, and the finally output size is (28/2)*(28/2) = 14*14 after pooling. But how to explain the following code about the padding = 'SAME':
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')