PyTorch convolutions are actually implemented as cross-correlations. This shouldn't produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does prove an issue when:
- trying to implement an actual convolution with the
functional
library - trying to copy the weights of an actual convolution from another deep learning library
The authors say the following in Deep Learning with PyTorch:
Convolution, or more precisely, discrete convolution1...
1. There is a subtle difference between PyTorch's convolution and mathematics' convolution: one argument's sign is flipped. If we were in a pedantic mood, we could call PyTorch's convolutions discrete cross-correlations.
But they don't explain why it was implemented like this. Is there a reason?
Maybe something similar to how the PyTorch implementation of CrossEntropyLoss
isn't actually cross entropy but an analogous function taking "logits" as inputs instead of raw probabilities (to avoid numerical instability)?