13

I am trying to implement a simple autoencoder using PyTorch. My dataset consists of 256 x 256 x 3 images. I have built a torch.utils.data.dataloader.DataLoader object which has the image stored as tensor. When I run the autoencoder, I get a runtime error:

size mismatch, m1: [76800 x 256], m2: [784 x 128] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1518371252923/work/torch/lib/TH/generic/THTensorMath.c:1434

These are my hyperparameters:

batch_size=100,
learning_rate = 1e-3,
num_epochs = 100

Following is the architecture of my auto-encoder:

class autoencoder(nn.Module):
    def __init__(self):
        super(autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(3*256*256, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(True),
            nn.Linear(64, 12),
            nn.ReLU(True),
            nn.Linear(12, 3))

        self.decoder = nn.Sequential(
            nn.Linear(3, 12),
            nn.ReLU(True),
            nn.Linear(12, 64),
            nn.ReLU(True),
            nn.Linear(64, 128),
            nn.Linear(128, 3*256*256),
            nn.ReLU())

def forward(self, x):
    x = self.encoder(x)
    #x = self.decoder(x)
    return x

This is the code I used to run the model:

for epoch in range(num_epochs):
for data in dataloader:
    img = data['image']
    img = Variable(img)
    # ===================forward=====================
    output = model(img)
    loss = criterion(output, img)
    # ===================backward====================
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
# ===================log========================
print('epoch [{}/{}], loss:{:.4f}'
      .format(epoch+1, num_epochs, loss.data[0]))
if epoch % 10 == 0:
    pic = show_img(output.cpu().data)
    save_image(pic, './dc_img/image_{}.jpg'.format(epoch))
Shreyas
  • 419
  • 1
  • 5
  • 14
  • 1
    in which line you are getting the error? what is the shape of `x` you are passing to the forward function? Is the first linear layer in the encoder: `nn.Linear(3*256*256, 128)` correct? – Wasi Ahmad Apr 02 '18 at 07:29
  • I am getting an error when I run output =model(input). As per my knowledge, the linear layer flattens the image and executes something like an "Y=Ax+B" operation. Since my input is 256X256X3 image, the total number of elements would be a multiplication of that. – Shreyas Apr 02 '18 at 07:32
  • I have added the code which I am using to train my model. – Shreyas Apr 02 '18 at 07:37
  • "As per my knowledge, the linear layer flattens the image". Did you test this assumption? Since, it doesn't seem to be true. – MaxPowers Apr 02 '18 at 07:39
  • The PyTorch documentation says so. Or at least what I inferred from it.http://pytorch.org/docs/master/nn.html#linear-layers – Shreyas Apr 02 '18 at 07:44

3 Answers3

33

Whenever you have:

RuntimeError: size mismatch, m1: [a x b], m2: [c x d]

all you have to care is b=c and you are done:

m1 is [a x b] which is [batch size x in features]

m2 is [c x d] which is [in features x out features]

prosti
  • 42,291
  • 14
  • 186
  • 151
  • how can you calculate the value of b? It seems the value of c is determined by the ```ChannelIn``` multiplied by the ```ChannelOut``` – KoKo Jun 09 '20 at 02:36
  • From own experience I would like to add: If one cannot explain b by a sensible calculation (e.g. image height * image width * number of filters) most probably the input dimension of pictures is different than assumed. E.g. I thought the input dim is 32x32 but it was 28x28. The model compiled until the dense layer but b was a strange number. – very_interesting Oct 09 '20 at 13:47
13

If your input is 3 x 256 x 256, then you need to convert it to B x N to pass it through the linear layer: nn.Linear(3*256*256, 128) where B is the batch_size and N is the linear layer input size. If you are giving one image at a time, you can convert your input tensor of shape 3 x 256 x 256 to 1 x (3*256*256) as follows.

img = img.view(1, -1) # converts [3 x 256 x 256] to 1 x 196608
output = model(img)
Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
1

Your error:

size mismatch, m1: [76800 x 256], m2: [784 x 128]

says that previous layer output shape is not equal to next layer input shape

[76800 x 256], m2: [784 x 128] # Incorrect!
[76800 x 256], m2: [256 x 128] # Correct!
Scott
  • 4,974
  • 6
  • 35
  • 62