0

I am trying to understand the reason behind the answer to this question. I was expecting the number of parameters to be:

total_params = (filter_height * filter_width  + 1) * number_of_filters

BUT you have to multiply the height and width by the number of input channels. Why is this? Isn't there parameter sharing for this dimension? If this is the case, how does this help with feature recognition?

I would expect a CNN to be able to infer relationships between channels, but I haven't seen how this is explicitly done.

Panda
  • 69
  • 2
  • 8

1 Answers1

1

Imagine you have an RGB image and want to pass a single filter: number_of_filters = 1.

How would this filter treat each of the input channels: R, G and B?

Should the filter treat all input channels equally? Does the green channel bring the same information as the red?

Well, no, each channel has its own information and the filter must consider all input channels, otherwise it will not be looking at the whole image.

This is exactly the same as with dense/fully connected networks, where you have:

total_params =( input_dim + 1 ) * units    

The only difference is that a convolutional filter has height and width.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Thank you for your answer. So the intuition is that channels are treated differently, so what's shared is just the spatial parameters for the filter (x,y), but not z. For each filter then, e.g (2x2) and three channels, there are 2x2x3 weights (plus biases) to learn? – Panda Sep 17 '19 at 21:18
  • 1
    Yes, in that example there are 2x2x3 weights. I'm not sure I understand what you mean by "shared". The kernel can slide through x and y, but always occupies the whole Z (not sliding in Z). – Daniel Möller Sep 18 '19 at 13:15