How to concat resnet output and original size of input image?

Question

I have image dataset with 17 classes. There is a significant differ in sizes of some classes eg. images in one class are on average 250x200 and in the other class 25x25. My idea was to concat output of pretrained resnet18 and original image size, because I think it's a valuable information for classification.

To be more specific - I would like to use resnet18 but to the last layer which is (fc): Linear(in_features=512, out_features=17, bias=True) I would like to add also Image.Shape which might be important for better classification.

Is this a reasonable solution for this kind of problem and is there a way to do it in PyTorch?

score 0 · Accepted Answer · answered Jul 06 '22 at 10:08

0

I guess you want to use the embedding that are produced by the Adaptive Pooling layer to have a fixed output size. First you need to get rid of the last linear layer (see this post):

model = models.resnet152(pretrained=True)
newmodel = torch.nn.Sequential(*(list(model.children())[:-1]))

Then you can get the embeddings and use pytorch.cat for concatenation:

out = model(X)
emb = torch.cat([out, X.shape])

answered Jul 06 '22 at 10:08

theophile

191
6

Hi, thanks for your reply. I assume I should give 'emb' on input of MLP, but how should I train 'newmodel'? Or should I put: out = model(X) emb = torch.cat([out, X.shape]) inside a training loop? Sorry if I'm missing something but this kind of problem is new to me. – Dobiks Jul 06 '22 at 17:40
To precise what I'm trying to do - I would like to get features from resnet18 to put them on MLP classificator with Image.Shape which might be important for proper classification. – Dobiks Jul 06 '22 at 17:44
The last layer of ResNet18 is a dense layer that gives probability of classes for the ImageNet challenge, which is not interesting, so you have to get rid of this fully connected layer to get the embedding. Thus newmodel only keep the convolutions and the pooling from ResNet, you don't have to train it because it keeps the orinal weights to output the embeddings in out. Then you can concatenate them with the shape to have the input of your MLP. Please tell me if this is not clear and want more explanations. – theophile Jul 07 '22 at 07:49

How to concat resnet output and original size of input image?

1 Answers1