In the paper 'Fully Convolutional Networks for Semantic Segmentation' the author distinguishes between input stride and output stride in the context of deconvolution. How do these terms differ from each other?
1 Answers
Input stride is the stride of the filter . How much you shift the filter in the output .
Output Stride this is actually a nominal value . We get feature map in a CNN after doing several convolution , max-pooling operations . Let's say our input image is 224 * 224 and our final feature map is 7*7 .
Then we say our output stride is : 224/7 = 32 (Approximate of what happened to the image after down sampling .)
This tensorflow script describe what is this output stride , and how to use in FCN which is the case of dense prediction .
one uses inputs with spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In this case the feature maps at the ResNet output will have spatial shape [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1] and corners exactly aligned with the input image corners, which greatly facilitates alignment of the features to the image. Using as input [225, 225] images results in [8, 8] feature maps at the output of the last ResNet block.

- 3,951
- 6
- 33
- 73
-
hi @Shamane, thanks for this answer. Unfortunately, the script is not available anymore. could you re-provide it ? – desmond13 Nov 18 '21 at 10:28