2

Today my teacher ask me a question: he said the CNN is use the translation invariance of the images or matrixs. So what is the properties Transformer uses ???

1 Answers1

1

There are two main properties of transformers that makes them so appealing compared to convolutions:

  1. A transformer is permutation equivariant. This makes transformers very useful for set predictions. For sequences and images where order does matter, positional encoding/embedding are used.
  2. The receptive field of a transformer is the entire input (!) as opposed to the very limited receptive field of a convolution layer.

See sec. 3 and fig. 3 in:
Shir Amir, Yossi Gandelsman, Shai Bagon and Tali Dekel Deep ViT Features as Dense Visual Descriptors (arXiv 2021).

Shai
  • 111,146
  • 38
  • 238
  • 371