What is "linear projection" in convolutional neural network

Question

I am reading through Residual learning, and I have a question. What is "linear projection" mentioned in 3.2? Looks pretty simple once got this but could not get the idea...

Can someone provide simple example?

I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in https://stackoverflow.com/tags/machine-learning/info — desertnaut, May 05 '22 at 21:10

Maxim · Accepted Answer · 2017-09-20T13:35:06.917

9

First up, it's important to understand what x, y and F are and why they need any projection at all. I'll try explain in simple terms, but basic understanding of ConvNets is required.

x is an input data (called tensor) of the layer, in case of ConvNets it's rank is 4. You can think of it as a 4-dimensional array. F is usually a conv layer (conv+relu+batchnorm in this paper), and y combines the two together (forming the output channel). The result of F is also of rank 4, and most of dimensions will be the same as in x, except for one. That's exactly what the transformation should patch.

For example, x shape might be (64, 32, 32, 3), where 64 is the batch size, 32x32 is image size and 3 stands for (R, G, B) color channels. F(x) might be (64, 32, 32, 16): batch size never changes, for simplicity, ResNet conv-layer doesn't change the image size too, but will likely use a different number of filters - 16.

So, in order for y=F(x)+x to be a valid operation, x must be "reshaped" from (64, 32, 32, 3) to (64, 32, 32, 16).

I'd like to stress here that "reshaping" here is not what numpy.reshape does.

Instead, x[3] is padded with 13 zeros, like this:

pad(x=[1, 2, 3],padding=[7, 6]) = [0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0]

If you think about it, this is a projection of a 3-dimensional vector onto 16 dimensions. In other words, we start to think that our vector is the same, but there are 13 more dimensions out there. None of the other x dimensions are changed.

Here's the link to the code in Tensorflow that does this.

edited Sep 20 '17 at 13:35

answered Sep 20 '17 at 13:27

Maxim

52,561
27
155
209

1

Thank you very much! I have mainly used matlab rather than python, and there might be a misunderstanding I guess. In matlab the last dimension is # of image in python the first dimension is # of image. – Troy Sep 21 '17 at 14:31
Got you. The order may be different, but the projection should be done like described in the answer. – Maxim Sep 21 '17 at 14:46
Please disregard above one. – Troy Sep 21 '17 at 14:50
1

Ah... again... habitually punched enter. The below is my actual question: Thank you very much, but I still have questions. I blame my clumsiness. My first questions is about width and height reduction in a residual connection. For example, they used stride of 2 which will lead to width and height reduction rather than number of filters. This was my first question. After I got your lessons, I realized different number of channels also cause problem. Could you provide me another lesson about them?. I mean different (width and height), and (different channel) in residual connection? – Troy Sep 21 '17 at 14:50
2

Good question, but not enough space here to fully answer it. In short: when the layer downsamples the image (by using `strides=2`), `x` goes through a pooling layer **as well** with same stride. So both `F(x)` and `x` reduce the size of an image by half, and just like before only the "channel" dimension needs to be projected. I could only find an example in python: https://github.com/tflearn/tflearn/blob/master/examples/images/residual_network_mnist.py You can see two layers with `downsample=True`, both of which scale down the image. – Maxim Sep 21 '17 at 15:45
The link to tensorflow code wasn't anchored to a specific commit so I believe it is now pointing to the wrong line of code as a result of changes on the master branch – Xander Dunn Dec 06 '20 at 19:55

score 1 · Answer 2 · answered Sep 08 '17 at 18:23

1

A linear projection is one where each new feature is simple a weighted sum of the original features. As in the paper, this can be represented by matrix multiplication. if x is the vector of N input features and W is an M-byN matrix, then the matrix product Wx yields M new features where each one is a linear projection of x. Each row of W is a set of weights that defines one of the M linear projections (i.e., each row of W contains the coefficients for one of the weighted sums of x).

answered Sep 08 '17 at 18:23

bogatron

18,639
6
53
47

Thank you for your kind explanations. Please confirm if I correctly understand. If an input x has 3X3 and we want to project it to 4X4. Than, we vectorize x[3X3] to [9X1]. and the W will be [16X9]. Therefore, the W [16X9] x [9X1] = [16X1], and reshape it to [4X4]. Is this what you explained? – Troy Sep 09 '17 at 20:33
Yes, you got it. – bogatron Sep 09 '17 at 21:12
1

@W.Choi this answer is technically correct, but a bit misleading, as can be seen by your comment. Please see my answer. – Maxim Sep 20 '17 at 13:34

IntegrateThis · Answer 3 · 2020-11-11T22:35:24.310

In Pytorch (in particular torchvision\models\resnet.py), at the end of a Bottleneck you will either have two scenarios

The input vector x's channels, say x_c (not spatial resolution, but channels), are less than equal to the output after layer conv3 of the Bottleneck, say d dimensions. This can then be alleviated by a 1 by 1 convolution with in planes = x_c and out_planes = d, with stride 1, followed by batch normalization, and then the addition F(x) + x occurs assuming x and F(x) have the same spatial resolution.
Both the spatial resolution of x and its number of channels don't match the output of the BottleNeck layer, in which case the 1 by 1 convolution mentioned above needs to have stride 2 in order for both the spatial resolution and the number of channels to match for the element-wise addition (again with batch normalization of x before the addition).

What is "linear projection" in convolutional neural network

3 Answers3

Linked