Convolution layer in CNN

Question

We know that Convolution layer in CNN uses filters and different filters will look for different information in the input image.

But let say in this SSD, we have prototxt file and it has specification for the convolution layer as

layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

All convolution layers in different networks like (GoogleNet, AlexNet, VGG etc) are more or less similar. Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?

EDIT: Let me clarify for my question. I see two convolutions layer from the prototxt file as follows. They are from SSD.

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

Then I print here of their outputs

Data

conv1_1 and conv2_1 images are here and here.

So my query is how these two conv layers produced different output. But no difference in prototxt file.

"Just look at that and how to understand, filters in this convolution layer try to extract which information of the input image?" Didn't get you? — Harsh Wardhan, Jan 12 '18 at 09:25
I didn't get your question. Do you want to know what each filter on each layer is looking for? — Chan Kha Vu, Jan 12 '18 at 09:48
In general, the first layers extract edge-like features ( which are appropriate for accurate localization) but as you go deeper into the network, filter mostly works on blob-shape feature which is appropriate for discriminating object with each other. — Hossein Kashiani, Jan 14 '18 at 15:15
@HosseinKa Yes. That is what I meaned. How you can tell the first convolution is looking for edge and the following are looking for blob-shape? Those in the prototxt file, they all look same. How you know which convolution is looking for which information. — batuman, Jan 16 '18 at 01:05

Hossein Kashiani · Answer 1 · 2018-01-16T11:27:44.990

4

The filters at earlier layers represent low-level features like edges (These features retain higher spatial resolution for precise localization with low-level visual information similar to the response map of Gabor filters). On the other hand, the filter at the mid-layer extract features like corners or blobs, which are more complex.

And as you go deeper you can not visualize and interpret these features, because filters in mid-level and high-level layers are not directly connected to the input image. For instance, when you get the output of the first layer you can actually visualize and interpret it as edges but when you go deeper and apply second convolution layer to these extracted edges (the output of the first layer), then you get something like edges of edges ( or sth like this) and capture more semantic information and less fine-grained spatial details. In the the prototxt file all convolutions and other types of operation can resemble each other. But they extract different kinds of features, because of having different order and weights.

edited Jan 16 '18 at 11:27

answered Jan 16 '18 at 11:20

Hossein Kashiani

330
1
6
18

Nice visualization images! – Chan Kha Vu Jan 16 '18 at 13:31
By the way, how do you plot images for these conv outputs. I plot by myself to those blobs. Is there any tools to plot? I plot those blobs, I got only gray scale. – batuman Jan 17 '18 at 01:16
Say for the fresh network with random weights, then we need to start training weights. At that scenario, we know only the loss information at the last layer, we calculate gradients and back-propagate layers by layers. Then each conv layer's weights are updated based on the loss. Weights are updated by themselves according to SGD. Then they themselves become this conv layer extracts this information, that conv layer extracts that information, etc. Is my understanding true? – batuman Jan 17 '18 at 01:37
@batuman Those images are not mine, maybe [this](https://arxiv.org/pdf/1412.6806.pdf) paper could help you to draw them. I think that's true. – Hossein Kashiani Jan 18 '18 at 17:53

Shai · Accepted Answer · 2018-01-16T10:58:44.430

1

"Convolution" layer differ not only in their parameters (e.g., kernel_size, stride, pad etc.) but also in their weights: the trainable parameters of the convolution kernels.
You see different output (aka "responses") because the weights of the filters are different.

See this answer regarding the difference between "data" blobs and "parameter/weights" blobs in caffe.

edited Jan 16 '18 at 10:58

answered Jan 16 '18 at 10:18

Shai

111,146
38
238
371

Your discussion in the link is quite clear. Thank you. – batuman Jan 17 '18 at 01:13
Say for the fresh network with random weights, then we need to start training weights. At that scenario, we know only the loss information at the last layer, we calculate gradients and back-propagate layers by layers. Then each conv layer's weights are updated based on the loss. Weights are updated by themselves according to SGD. Then they themselves become this conv layer extracts this information, that conv layer extracts that information, etc. Is my understanding true? – batuman Jan 17 '18 at 01:35

Convolution layer in CNN

2 Answers2