0

I'm trying to understand what does the nn.conv2d do internally.

so lets assume we are applying Conv2d to a 32*32 RGB image.

torch.nn.Conv2d(3, 49, 4, bias=True)

so :

  1. when we initialize the conv layer how many weights and in which shapes would it have, please tell this for biases apart?
  2. before applying it the conv the image would have 3 * 32 * 32 shape and after applying it would have 49 * 29 * 29 so what happens in between?

I define "slide" operation (don't know real name) as multiplying first to element of kernel to first element of box in shape of image going till last element of kernel corresponding making one the 1of29 * 1of29 is calculated. and "slide all" doing this horizontally and vertically till the all 29 * 29 are calculated.

so I understand how a kernel would act but I don't understand how many kernels would be created by the torch.nn.Conv2d(3, 49, 4, bias=True) and which of them would be apllying on R,G,B channels.

Farhang Amaji
  • 742
  • 11
  • 24
  • 1
    see [this](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1) – Shai Sep 09 '21 at 13:10
  • @Shai thanx so I think the answer to my first question is 49*4*4 weights + 49 biases. – Farhang Amaji Sep 09 '21 at 13:33
  • 2
    no as Ivan said 49*(4*4*3 + 1) would be the weights. – Farhang Amaji Sep 09 '21 at 13:35
  • 1
    @FarhangAmaji Indeed 49*4*4*3 weights + 49 biases. – Ivan Sep 09 '21 at 13:39
  • @Ivan I have another question which I have asked it in "cross validated" but they closed that, which it's unacceptable for me https://stats.stackexchange.com/questions/543809/recommended-ways-of-splitting-train-test-in-time-series-in-neural-networks because all alternative links which are provided are for cross validation not data splitting. – Farhang Amaji Sep 09 '21 at 13:43
  • first of all aren't the data splitting and cross validation different? if so, so they shouldn't close it. – Farhang Amaji Sep 09 '21 at 13:43
  • @FarhangAmaji This is a different question to *understanding pytorch conv2d internally*, the best is you create a new question on Stack Overflow with your problem. – Ivan Sep 09 '21 at 13:49
  • I know I was just looking for a favor from you, and don't get me wrong, the favor is a short answer to "aren't the cross-validation and data splitting different ?" so I can decide whether to continue to try to reopen my question there or not, cause once my flag is declined? and as I think you would probably know the answer and the fact the cross validation is about having different models but data splitting is general and can be applied to one model no different ones. – Farhang Amaji Sep 09 '21 at 13:58
  • unfortunately no ones gives reason to flags. – Farhang Amaji Sep 09 '21 at 14:00
  • ofc I think Iam right but declining the flag made me dout about it. – Farhang Amaji Sep 09 '21 at 14:06
  • Did you read the [proposed thread](https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection) they linked you? – Ivan Sep 09 '21 at 14:08
  • kinda there are answers to cross validation I think, even the Hyndman and Athanasopoulos way. – Farhang Amaji Sep 09 '21 at 14:30

1 Answers1

2

I understand how a kernel would act but I don't understand how many kernels would be created by the nn.Conv2d(3, 49, 4, bias=True) and which of them would be applying on R, G, and B channels.

Calling nn.Conv2d(3, 49, 4, bias=True) will initialize 49 4x4-kernels, each having a total of three channels and a single bias parameter. That's a total of 49*(4*4*3 + 1) parameters, i.e. 2,401 parameters.

You can check that it is indeed correct with:

>>> conv2d = nn.Conv2d(3, 49, 4, bias=True)

Parameters list will contain the weight tensor shaped (n_filters=49, n_channels=3, kernel_height=4, kernel_width=4), and a bias tensor shaped (49,):

>>> [p.shape for p in conv2d.parameters()]
[torch.Size([49, 3, 4, 4]), torch.Size([49])]

If we get a look at the total number of parameters, we indeed find:

>>> nn.utils.parameters_to_vector(conv2d.parameters()).numel()
2401

Concerning how they are applied: each one of the 49 kernels will be applied 'independently' to the input map. For each filter operation, you are convolving the input of a three-channel tensor, with a three-channel kernel. Each one of those 49 convolutions gets its respective bias added. At the end, you are left with a number of 49 single-channel maps which are concatenated to make up a single 49-channel map. In practice, everything is done in one go using a window view of the input.

I am certainly biased towards my own posts: here you will find another explanation of shapes in convolutional neural networks.

Ivan
  • 34,531
  • 8
  • 55
  • 100