Transfer Learning Using VGG16 on CIFAR 10 Dataset: Very High Training and Testing Accuracy But Wrong Predictions

Question

I trained the vgg16 model on the cifar10 dataset using transfer learning. It reaches around 89% training accuracy after one epoch and around 89% testing accuracy too. However, using the trained model to predict labels for images other than the dataset it gives wrong answers. Even labels very clear images wrongly.

I've tried increasing epochs to 20 which increases training and testing accuracy to around 93-94% and tried many different images. The trained model predicts images from the dataset correctly but has trouble with new images.

#!/usr/bin/env python
# coding: utf-8

# In[1]:

from keras.models import load_model
import numpy as np
from tqdm import tqdm
from keras import models
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.applications.vgg16 import VGG16,preprocess_input
from keras.optimizers import Adam
from keras.models import Sequential, Model
from keras.layers import Dense, Flatten, GlobalAveragePooling2D
import pandas as pd
from keras.utils import np_utils
np.random.seed(123) 


# In[2]:


from keras.datasets import cifar10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()


# In[3]:


print (X_train.shape)
print (X_test.shape)
#print (X_train[:2])

# In[4]:


from matplotlib import pyplot as plt


# In[5]:


X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255


# In[6]:


print (y_train.shape)
print (y_test[:10])



# In[7]:


Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)


# In[8]:



# In[9]:


import cv2


# In[24]:


train_set_x= X_train[:500]

train_set_y= Y_train[:500]
test_set_x= X_test[:100]
test_set_y= Y_test[:100]

# In[33]:

plt.imshow(X_test[1])
plt.show()

#train_set_y.shape


# In[27]:


frozen = VGG16 (weights="imagenet", input_shape=(32,32,3), include_top=False)


# In[28]:


frozen.summary()


# In[36]:


trainable = frozen.output
trainable = GlobalAveragePooling2D()(trainable)
#print(trainable.shape)
trainable = Dense(128, activation="relu")(trainable)
trainable = Dense(32, activation="relu")(trainable)
trainable = Dense(10, activation="softmax")(trainable)


# In[37]:


model = Model(inputs=frozen.input, outputs=trainable)


# In[38]:


model.summary()


# In[16]:


model.layers


# In[18]:


for layer in model.layers[:-4]:
    layer.trainable = False


# In[19]:


for layer in model.layers:
    print(layer, layer.trainable)


# In[40]:


learning_rate = 0.0001
opt = Adam(lr=learning_rate)
model.compile(optimizer=opt,
              loss='binary_crossentropy',
              metrics=['accuracy'])


# In[41]:


def evaluate_this_model(model, epochs):

    np.random.seed(1)

    history = model.fit(train_set_x, train_set_y, epochs=epochs)
    results = model.evaluate(test_set_x, test_set_y)

    plt.plot(np.squeeze(history.history["loss"]))
    plt.ylabel('cost')
    plt.xlabel('iterations (per tens)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()

    print("\n\nAccuracy on training set is {}".format(history.history["acc"][-1]))
    print("\nAccuracy on test set is {}".format(results[1]))


# In[42]:


train_set_x.shape


evaluate_this_model(model, 1)
model.save('vgg16.h5')

model1=load_model('vgg16.h5')




IMG_SIZE=32
path1='../input/ship.png'
img_data1 = cv2.imread(path1, cv2.IMREAD_COLOR)
img_data1 = cv2.resize(img_data1, (IMG_SIZE, IMG_SIZE))
data1 = img_data1.reshape(-1, IMG_SIZE, IMG_SIZE, 3)
model_out=model1.predict(data1)

if np.argmax(model_out) == 1:
    str_label = 'Automobile'
    print(str_label)
if np.argmax(model_out) == 2:
    str_label = 'Bird'
    print(str_label)
if np.argmax(model_out) == 3:
    str_label = 'Cat'
    print(str_label)
if np.argmax(model_out) == 4:
    str_label = 'Deer'
    print(str_label)
if np.argmax(model_out) == 0:
    str_label = 'Airplane'
    print(str_label)
if np.argmax(model_out) == 5:
    str_label = 'Dog'
    print(str_label)
if np.argmax(model_out) == 6:
    str_label = 'Frog'
    print(str_label)
if np.argmax(model_out) == 7:
    str_label = 'Horse'
    print(str_label)
if np.argmax(model_out) == 8:
    str_label = 'Ship'
    print(str_label)
if np.argmax(model_out) == 9:
    str_label = 'Truck'
    print(str_label)

The trained model predicts and labels correctly on dataset images even after one epoch but has trouble with new images it gives wrong labels entirely. For example: It labels a very clear image of a ship as deer. Same for other classes as well.

score 1 · Answer 1 · answered Apr 20 '19 at 20:02

1

It looks like you're scaling the color of training and test data by dividing by 255. I don't see this happening for ship.png. I'd suggest creating a function that does all of the preprocessing and making sure to run it for training, test, and prediction so that you can be sure that you apply the exact same cleaning on all images.

answered Apr 20 '19 at 20:02

PatrickR2

13
6

Thanks for pointing that out and the suggestion. I applied the fix you suggested however, it didn't fix the problem. The ship went from being a deer to a cat. Tested with many other images as well. – Hamzah Malik Apr 20 '19 at 21:23
I think there’s also an issue with your color channels. CIFAR10 is RGB https://keras.io/datasets/ and a color image loaded by OpenCV is mode https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_gui/py_image_display/py_image_display.html – PatrickR2 Apr 21 '19 at 00:36
While I think my above two points still hold, the biggest issue is probably your loss function. You're using binary_crossentropy when you should be using categorical_crossentropy. Please see these posts about why you may want to use categorical_crossentropy as opposed to binary_crossentropy [here by desertnaut](https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance) and [here by sal-a](https://stackoverflow.com/questions/45799474/keras-model-evaluate-vs-model-predict-accuracy-difference-in-multi-class-nlp-ta/45834857#45834857) – PatrickR2 Apr 21 '19 at 01:49

Transfer Learning Using VGG16 on CIFAR 10 Dataset: Very High Training and Testing Accuracy But Wrong Predictions

1 Answers1