CNN (VGG-16) strange behaviour on validation accuracy

Question

I have built and tested two convolutional Neural Network models (VGG-16 and 3-layer CNN) to predict classification of lung CT scans for COVID-19.

Prior to the classification, I've performed image segmentation via k-means clustering on images to try to improve the classification performance.

The segmented images look like below.

And I've trained and evaluated VGG-16 model on both segmented images and raw images separately. And lastly, trained and evaluated a 3-layer CNN on the segmented images only. Below is the results for their train/validation loss and accuracy.

For the simple 3-layer CNN model, I can clearly see that the model is trained well and also it starts to overfit once epochs are over 2. But, I don't understand how validation accuracy of the VGG model doesn't look like an exponential curve instead it looks like a horizontally straight line or a fluctuating horizontal line. And besides, the simple 3-layer CNN models seems to perform better. Is this due to gradient vanishing in VGG model ? Or the image itself is simple that deep architecture doesn't benefit? I'd appreciate if you could share your knowledge on such learning behaviour of the models.

This is the code for the VGG-16 model:

# build model
img_height = 256
img_width = 256

model = Sequential()
model.add(Conv2D(input_shape=(img_height,img_width,1),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=1, activation="sigmoid"))
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])

And this is a code for the 3-layer CNN.

# build model
model2 = Sequential()
model2.add(Conv2D(32, 3, padding='same', activation='relu',input_shape=(img_height, img_width, 1))) 
model2.add(MaxPool2D()) 
model2.add(Conv2D(64, 5, padding='same', activation='relu'))
model2.add(MaxPool2D())
model2.add(Flatten())
model2.add(Dense(128, activation='relu'))
model2.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.001)
model2.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])

Thank you!

@CAFEBABE There are 598 training and 148 testing images with 80-20 proportion. — traivsh, Jun 28 '20 at 04:10
is the distribution between the two classes 80-20 or the train test split? What is the class distribution? — CAFEBABE, Jun 28 '20 at 11:08
@CAFEBABE it's 349 positive and 397 negative. Fairly balances for the class distribution. — traivsh, Jun 28 '20 at 12:00

score 2 · Accepted Answer · answered Jun 27 '20 at 08:28

2

Looking at the accuracies for an assumed to be binary problem you can observe that the model is just random guessing (acc ~ 0.5). The fact that your 3-layer model gives much better results on the train set indicates that you are not training long enough to overfit. In addition you do not seem to use a proper initalization of the NN. Note: at the beginning of an implementation process overfitting is indicating that implementation training just works fine. Hence it is a good thing in this phase. Therefore, first step would be to get the model overfitting. You seem to train from scratch. In that case it can take a few 100 epochs until the gradients impact the first convolutions on a complex model like VGG16.

As the 3Layer CNN seems to overfit quite heavily I conclude that your dataset is rather small. Hence, I would recommend to start from a pre-trained model (VGG16) and just re-train the last two layers. This should give much better result.

answered Jun 27 '20 at 08:28

CAFEBABE

3,983
1
19
38

Thanks for your inputs @CAFEBABE ! Yes, I trained VGG model from scratch. I'll try to train my model with more epochs and also try implementing the pre-trained VGG model. – traivsh Jun 28 '20 at 04:17
I tried increasing epochs and ran into GPU OOM problem. So to resolve this, I tried to train my model using multiprocessing but it's not working. @CAFEBABE If you've managed OOM problem or trained a model using multiprocessing, I'd appreciate your insights for my problem. https://stackoverflow.com/questions/62620104/keras-not-running-in-multiprocessing – traivsh Jun 28 '20 at 08:34
Concerning your OOM https://stackoverflow.com/questions/46981853/tensorflow-gpu-oom-issue-after-several-epochs This shouldn't happen. You have so little data and the images are rather small. This should easily work on a single CPU. Especially as the data is not sufficient for training VGG16 from scratch anyhow – CAFEBABE Jun 28 '20 at 11:06
I've encountered OOM after running model.fit with callback multiple times. I think as described in the post that you linked, there was GPU leakage. I'll try to follow the answer from the post. And besides, I've tried couple of things on training VGG-16 from scratch. I'll post the results as an answer. – traivsh Jun 28 '20 at 12:40

score 1 · Answer 2 · answered Jun 28 '20 at 15:58

As per what @CAFEBABE suggested, I have tried two approaches. First, I have increased epochs size to 200, changed optimiser to SGD and reduced learning rate down to 1e-5. And second, I have implemented pre-trained weights for the VGG-16 model and only trained the last two convolutional layers. Below is the plot displaying the tuned VGG-16 model, the pre-trained VGG-16 model and the 3-layer CNN model (from top to bottom).

Certainly, tuning had an effect on the performance but it was very marginal. I guess the learnable features from the dataset with ~600 images were not sufficient enough to train the model. And the pre-trained model significantly benefitted the model reaching overfitting at ~25 epochs. However, in comparion with the 3-layer CNN model, the testing accuracies of these two models are similar ranging between 0.7 and 0.8. I guess this is again due to the limitation of the datasets.

Thanks again to @CAFEBABE for helping my problem and I hope this can help other people who might face similar problem as I did.

You can also try Adam optimizer, it may lead to a faster convergence. — Rishabh Agrahari, Jun 28 '20 at 16:03

CNN (VGG-16) strange behaviour on validation accuracy

2 Answers2