For an image or sequence, what is the properties transformers use?

Question

Today my teacher ask me a question: he said the CNN is use the translation invariance of the images or matrixs. So what is the properties Transformer uses ???

score 1 · Accepted Answer · answered Jan 05 '22 at 08:50

There are two main properties of transformers that makes them so appealing compared to convolutions:

A transformer is permutation equivariant. This makes transformers very useful for set predictions. For sequences and images where order does matter, positional encoding/embedding are used.
The receptive field of a transformer is the entire input (!) as opposed to the very limited receptive field of a convolution layer.

See sec. 3 and fig. 3 in:
Shir Amir, Yossi Gandelsman, Shai Bagon and Tali Dekel Deep ViT Features as Dense Visual Descriptors (arXiv 2021).

For an image or sequence, what is the properties transformers use?

1 Answers1