Today my teacher ask me a question: he said the CNN is use the translation invariance of the images or matrixs. So what is the properties Transformer uses ???
Asked
Active
Viewed 100 times
1 Answers
1
There are two main properties of transformers that makes them so appealing compared to convolutions:
- A transformer is permutation equivariant. This makes transformers very useful for set predictions. For sequences and images where order does matter, positional encoding/embedding are used.
- The receptive field of a transformer is the entire input (!) as opposed to the very limited receptive field of a convolution layer.
See sec. 3 and fig. 3 in:
Shir Amir, Yossi Gandelsman, Shai Bagon and Tali Dekel Deep ViT Features as Dense Visual Descriptors (arXiv 2021).

Shai
- 111,146
- 38
- 238
- 371