I don't understand pytorch input sizes of conv1d, conv2d

Question

I have a data of 2 temporal series of 18 points each one. So I organized in a matrix of 18 rows and 2 columns (with 180 samples to classify in 2 classes - activated and non-activated).

So, I want to do a CNN with this data, my kernel walks in one direction, along the lines (temporal). Examples of the figure attached.

My data 18x2

In my code, I don't know how channels I have, in comparison to RGB with 3 channels. And don't know the input sizes of the layers, and how to calculate to know the fully connected layer.

I need to use conv1d ? conv2d? conv3d ? Based on Understand conv 1D 2D 3D, I have 2D inputs and I want to do 1D convolution (because I move my kernel in one direction), is it correct ?

How I pass the kernel size (3,2) for example?

My data is in this form, after using DataLoader with batch_size= 4:

print(data.shape, label.shape)

torch.Size([4, 2, 18]) torch.Size([4, 1])

My Convolutional Model is:

OBS: I just put any number of input/output size.

# Creating our CNN Model -> 1D convolutional with 2D input (HbO, HbR)

class ConvModel(nn.Module):
    def __init__(self):
        super(ConvModel, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=1,  out_channels= 18, kernel_size=3, stride = 1)
# I dont know the in/out channels of the first conv
        self.maxpool = nn.MaxPool1d(kernel_size=3, stride=3)
        self.conv2 = nn.Conv1d(18, 32, kernel_size=3)
        self.fc1 = nn.Linear(200, 100)  #What I put in/out here ?
        self.fc2 = nn.Linear(100, 50)
        self.fc3 = nn.Linear(50, 2)

    def forward(self, x):
        x = F.relu(self.mp(self.conv1(x)))
        x = self.maxpool(x)

        x = F.relu(self.mp(self.conv2(x)))
        x = self.maxpool(x)

        x = x.view(-1, ??)  # flatten the tensor, which number here ?

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

jodag · Answer 1 · 2020-03-08T20:22:03.670

You will want to use a two channel conv1d as the first convolution later. I.e. it will take in a tensor of shape [B, 2, 18]. Having 2 channel input with kernel size 3 will define kernels of shape [2, 3] where the kernel slides along the last dimension of the input. The number of channels C1 in your output feature map is up to you. C1 defines how many independent [2, 3] kernels you learn. Each convolution with a [2, 3] kernel produces an output channel.

Note that if you don't define any zero padding during conv1d then the output for a size 3 kernel will be reduced by 2, i.e. you will get [B, C1, 16]. If you include a padding of 1 (which effectively pads both sides of input with a column of zeros before convolving) then the output would be [B, C1, 18].

Max-pooling doesn't change the number of channels. If you use a kernel size of 3, stride of 3, and no padding then the last dimension will be reduced down to floor(x.size(2) / 3) where x is the input tensor to the max-pooling layer. If the input isn't a multiple of 3 then the values at the end of x feature map will be ignored (AKA a kernel/window alignment issue).

I recommend taking a look at the documentation for nn.Conv1d and nn.MaxPool1d since it provides equations to compute the output shape.

Let's consider two examples. You can define C1, C2, F1, F2 however you like. The optimal values will depend on your data.

Without padding we get

class ConvModel(nn.Module):
    def __init__(self):
        # input [B, 2, 18]
        self.conv1 = nn.Conv1d(in_channels=2, out_channels=C1, kernel_size=3)
        # [B, C1, 16]
        self.maxpool = nn.MaxPool1d(kernel_size=3, stride=3)
        # [B, C1, 5]    (WARNING last column of activations in previous layer are ignored b/c of kernel alignment)
        self.conv2 = nn.Conv1d(C1, C2, kernel_size=3)
        # [B, C2, 3]
        self.fc1 = nn.Linear(C2*3, F1)
        # [B, F1]
        self.fc2 = nn.Linear(F1, F2)
        # [B, F2]
        self.fc2 = nn.Linear(F2, 2)
        # [B, 2]

    def forward(x):
        x = F.relu(self.mp(self.conv1(x)))
        x = self.maxpool(x)

        x = F.relu(self.mp(self.conv2(x)))
        x = self.maxpool(x)

        x = x.flatten(1) # flatten the tensor starting at dimension 1

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

Notice the kernel alignment issue with the max-pooling layer. This occurs because the input to max-pooling isn't a multiple of 3. To avoid the kernel alignment issue and to make output sizes more consistent I recommend including an additional padding of 1 to both the convolution layers. Then you would have

class ConvModel(nn.Module):
    def __init__(self):
        # input [B, 2, 18]
        self.conv1 = nn.Conv1d(in_channels=2, out_channels=C1, kernel_size=3, padding=1)
        # [B, C1, 18]
        self.maxpool = nn.MaxPool1d(kernel_size=3, stride=3)
        # [B, C1, 6]    (no alignment issue b/c 18 is a multiple of 3)
        self.conv2 = nn.Conv1d(C1, C2, kernel_size=3, padding=1)
        # [B, C2, 6]
        self.fc1 = nn.Linear(C2*6, F1)
        # [B, F1]
        self.fc2 = nn.Linear(F1, F2)
        # [B, F2]
        self.fc2 = nn.Linear(F2, 2)
        # [B, 2]

    def forward(x):
        x = F.relu(self.mp(self.conv1(x)))
        x = self.maxpool(x)

        x = F.relu(self.mp(self.conv2(x)))
        x = self.maxpool(x)

        x = x.flatten(1) # flatten the tensor starting at dimension 1

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

I don't understand pytorch input sizes of conv1d, conv2d

1 Answers1