38

I am getting confused with the meaning of "backbone" in neural networks, especially in the DeepLabv3+ paper. I did some research and found out that backbone could mean

the feature extraction part of a network

DeepLabv3+ took Xception and ResNet-101 as its backbone. However, I am not familiar with the entire structure of DeepLabv3+, which part the backbone refers to, and which parts remain the same?

A generalized description or definition of backbone would also be appreciated.

Michael Mior
  • 28,107
  • 9
  • 89
  • 113
zheyuanWang
  • 1,158
  • 2
  • 16
  • 30
  • 1
    I think that it is just a concept used in the paper https://arxiv.org/pdf/1703.06870.pdf. It is nothing special. just the first block of their image. – Peyman Jan 22 '20 at 21:57

4 Answers4

39

In my understanding, the "backbone" refers to the feature extracting network which is used within the DeepLab architecture. This feature extractor is used to encode the network's input into a certain feature representation. The DeepLab framework "wraps" functionalities around this feature extractor. By doing so, the feature extractor can be exchanged and a model can be chosen to fit the task at hand in terms of accuracy, efficiency, etc.

In case of DeepLab, the term backbone might refer to models like the ResNet, Xception, MobileNet, etc.

Michael Mior
  • 28,107
  • 9
  • 89
  • 113
FranklynJey
  • 646
  • 4
  • 8
28

TL;DR Backbone is not a universal technical term in deep learning.

(Disclaimer: yes, there may be a specific kind of method, layer, tool etc. that is called "backbone", but there is no "backbone of a neural network" in general.)

If authors use the word "backbone" as they are describing a neural network architecture, they mean

  • feature extraction ( a part of the network that "sees" the input), but this interpretation is not quite universal in the field: for instance, in my opinion, computer vision researchers would use the term to mean feature extraction, whereas natural language processing researchers would not.
  • in informal language, that this part in question is crucial to the overall method.
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
18

Backbone is a term used in DeepLab models/papers to refer to the feature extractor network. These feature extractor networks compute features from the input image and then these features are upsampled by a simple decoder module of DeepLab models to generate segmented masks. The authors of DeepLab models have shown performance with different feature extractors (backbones) like MobileNet, ResNet, and Xception network.

Manas
  • 888
  • 10
  • 20
6

CNNs are used for extracting features. Several CNNs are available, for instance, AlexNet, VGGNet, and ResNet(backbones). These networks are mainly used for object classification tasks and have evaluated on some widely used benchmarks and datasets such as ImageNet. In image classification or image recognition, the classifier classifies a single object in the image, outputs a single category per image, and gives the probability of matching a class. Whereas in object detection, the model must be able to recognize several objects in a single image and provides the coordinates that identify the location of the objects. This shows that the detection of objects can be more difficult than the classification of images.

source and more info: https://link.springer.com/chapter/10.1007/978-3-030-51935-3_30