I am using Pytorch 1.8.1 and although I know the newer version has padding "same" option, for some reasons I do not want to upgrade it. To implement same padding for CNN with stride 1 and dilation >1, I put padding as follows:
padding=(dilation*(cnn_kernel_size[0]-1)//2, dilation*(cnn_kernel_size[1]-1)//2))
According to the Pytorch document, I expected the input and output size will be the same, but it did not happen!
It is written in Pytorch document that:
Hout=⌊( Hin + 2×padding[0] − dilation[0]×(kernel_size[0]−1) −1) /stride[0] + 1⌋
Wout=⌊( Win + 2×padding[1] − dilation[1]×(kernel_size[1]−1) −1) /stride[1] + 1⌋
The input of torch.nn.Conv2d was with the shape of (1,1,625,513) which based on the Conv2d pytorch document, indicates batch size = 1, C in = 1, H in = 625 and Win = 513
and after using:
- 64 filters
- kernel size of (15,15)
- stride = (1,1)
- dilation =5
- padding =(35,35)
Putting those values in the formulas above gives us:
Hout=⌊(625 + 2×35 −5×(15−1) −1) /1 +1⌋=⌊(625 + 70 −5×14 -1) + 1⌋=625
Wout=⌊(513 + 2×35 −5×(15−1) −1) /1 +1⌋=⌊(513 + 70 −5×14 -1) + 1⌋=513
However, the given output shape by pytorch was (1,64,681,569)
I can understand the value of 1 and C out = 64. But I don't know why H out and W out are not the same as H in and W in? Does anyone has any explanation that can help?