How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Question

Given a pytorch input dataset with dimensions:

dat.shape = torch.Size([128, 3, 64, 64])

This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.

I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:

two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels

Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:

 Output of Convolutional layer -> Input of Linear [Readout] layer

class CNNClassifier(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.linear1 = nn.Linear(32*16*16, C)
        self.softmax1 = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(F.leaky_relu(x))
        x = self.conv2(x)
        x = self.maxpool(F.leaky_relu(x))
        x = self.linear1(x)  # Size mismatch error HERE 
        x = self.softmax1(x)
        return x

Training of the model is started by :

        Xout = model(dat)

This results in :

RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]

at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:

    ..
    x = x.view(x.size(0), -1)
    x = self.linear1(x)  # Size mismatch error HERE

If that is included then the error changes to

RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]

Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.

Szymon Maszke · Accepted Answer · 2020-10-08T14:10:42.630

The error

You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].

You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).

Use this for nn.Linear:

self.linear1 = nn.Linear(32 * 7 * 7, C)

And this in forward:

x = self.linear1(x.view(x.shape[0], -1))

Other possibilities

Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:

class CNNClassifier(torch.nn.Module):
    def __init__(self, C=10):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
        self.linear1 = nn.Linear(32, C)
        self.softmax1 = nn.LogSoftmax(dim=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(F.leaky_relu(x))
        x = self.conv2(x)
        x = self.maxpool(F.leaky_relu(x))
        x = self.linear1(self.pooling(x).view(x.shape[0], -1))
        x = self.softmax1(x)
        return x

So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.

The addition of the `AdaptiveAvgPool2d` is perfect for the case where you might have multiple image sizes; I thought of adding it myself. +1 — David, Oct 08 '20 at 14:11
Thx! For the first approach and size `32x7x7` Did you calculate that via `O = (W - K + 2*P)/S + 1` ? — WestCoastProjects, Oct 08 '20 at 17:52
@javadba You can find exact formulas in PyTorch docs. This time I've just `printed` `shape` attribute before `linear`. Also, usually, with `kernel_size=3` in `Conv` you use `padding=1` so the shape stays the same (and whole thing is easier to follow). — Szymon Maszke, Oct 08 '20 at 18:00

score 2 · Answer 2 · answered Oct 08 '20 at 14:06

So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.

If you will add print(x.shape) before the entrance to the fully connected layer you will get:

torch.Size([Batch, 32, 7, 7])

So your calculation should have been 7*7*32:

self.linear1 = nn.Linear(32*7*7, C)

And then using:

x = x.view(x.size(0), -1)
x = self.linear1(x)

Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

2 Answers2

The error

Other possibilities