I try to train a model for a binary classification problem with the images being infrared (temperatures) with one-channel. After converting them to three channels (by replicate the 3rd channel) I tried two CNN architecture, VGG-11, and VGG-16, but didn't manage to get a stable training(low accuracy, and after 2-10 epochs(depending on learning rate adjustment) loss freezes in some value.
Standard VGG architecture is used except from AdaptiveAvgPool2d()
which is first used in order to alleviate inputs with an arbitrary size. The input size of images is 340x340.
CrossEntropyLoss()
is used with labels [0,1] output from the aforementioned network given. Also, the model is trained from scratch(because of the data's nature).
Any idea for improving my architecture in the needs of my problem?
I haven't found many works on infrared-image classification, so any help would be highly appreciated.