Keras prediction accuracy does not match training accuracy

Question

I used some spare time to quick learn some Python and Keras. I created an image-set of 4.050 images of class a (Clover) and 2.358 images of class b (Grass). There might be some more classes coming, so I did not went for binary class_mode.

The images are organised in subfolders for each class and I devided them randomly into 70% training and 30% testing data with the accodring folder-structure. The train and test-data is not normalised, yet.

I trained the model and saved the results. I get a training accuracy of around 90%. When I now try to predict a single image (which is the desired use-case), the average accuracy of this prediction is ~64%, which is very close to the percentage of overall class a images (4.050 / (4.050+2.358) = ~63%). For this test I used random images of the actual dataset but the same bad results are visible with real new data. Looking at the predictions, it mostly preditcs class a and a few times class b. Why is this happening? I do not know what is wrong. Can you have a look?

So the model is build here:

epochs = 50
IMG_HEIGHT = 50
IMG_WIDTH = 50

train_image_generator = ImageDataGenerator(
                    rescale=1./255,
                    rotation_range=45,
                    width_shift_range=.15,
                    height_shift_range=.15,
                    horizontal_flip=True,
                    zoom_range=0.1)


validation_image_generator = ImageDataGenerator(rescale=1./255)
train_path = os.path.join(global_dir,"Train")
validate_path = os.path.join(global_dir,"Validate")

train_data_gen = train_image_generator.flow_from_directory(directory=train_path,
                                                               shuffle=True,
                                                               target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                               class_mode='categorical')
val_data_gen = validation_image_generator.flow_from_directory(directory=validate_path,
                                                               shuffle=True,
                                                               target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                               class_mode='categorical')


model = Sequential([
        Conv2D(16, 3, padding='same', activation='relu',
               input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
        MaxPooling2D(),
        Conv2D(32, 3, padding='same', activation='relu'),
        MaxPooling2D(),
        Dropout(0.2),
        Conv2D(64, 3, padding='same', activation='relu'),
        MaxPooling2D(),
        Dropout(0.2),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(64, activation='relu'),
        Dense(2, activation='softmax')
    ])

model.compile(optimizer='adam',
              loss=keras.losses.categorical_crossentropy,
              metrics=['accuracy'])

model.summary()

history = model.fit(
    train_data_gen,
    batch_size=200,
    epochs=epochs,
    validation_data=val_data_gen
)

model.save(global_dir + "/Model/1)

The training output is the following:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 50, 50, 16)        448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 25, 25, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 25, 25, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 12, 12, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 2304)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               1180160   
_________________________________________________________________
dense_1 (Dense)              (None, 64)                32832     
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 130       
=================================================================
Total params: 1,236,706
Trainable params: 1,236,706
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
141/141 [==============================] - 14s 102ms/step - loss: 0.6216 - accuracy: 0.6468 - val_loss: 0.5396 - val_accuracy: 0.7120
Epoch 2/50
141/141 [==============================] - 12s 86ms/step - loss: 0.5129 - accuracy: 0.7488 - val_loss: 0.4427 - val_accuracy: 0.8056
Epoch 3/50
141/141 [==============================] - 12s 86ms/step - loss: 0.4917 - accuracy: 0.7624 - val_loss: 0.5004 - val_accuracy: 0.7705
Epoch 4/50
141/141 [==============================] - 15s 104ms/step - loss: 0.4510 - accuracy: 0.7910 - val_loss: 0.4226 - val_accuracy: 0.8198
Epoch 5/50
141/141 [==============================] - 12s 85ms/step - loss: 0.4056 - accuracy: 0.8219 - val_loss: 0.3439 - val_accuracy: 0.8514
Epoch 6/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3904 - accuracy: 0.8295 - val_loss: 0.3207 - val_accuracy: 0.8646
Epoch 7/50
141/141 [==============================] - 12s 85ms/step - loss: 0.3764 - accuracy: 0.8304 - val_loss: 0.3185 - val_accuracy: 0.8702
Epoch 8/50
141/141 [==============================] - 12s 87ms/step - loss: 0.3695 - accuracy: 0.8362 - val_loss: 0.2958 - val_accuracy: 0.8743
Epoch 9/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3455 - accuracy: 0.8574 - val_loss: 0.3096 - val_accuracy: 0.8687
Epoch 10/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3483 - accuracy: 0.8473 - val_loss: 0.3552 - val_accuracy: 0.8412
Epoch 11/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3362 - accuracy: 0.8616 - val_loss: 0.3004 - val_accuracy: 0.8804
Epoch 12/50
141/141 [==============================] - 12s 85ms/step - loss: 0.3277 - accuracy: 0.8616 - val_loss: 0.2974 - val_accuracy: 0.8733
Epoch 13/50
141/141 [==============================] - 12s 85ms/step - loss: 0.3243 - accuracy: 0.8589 - val_loss: 0.2732 - val_accuracy: 0.8931
Epoch 14/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3324 - accuracy: 0.8563 - val_loss: 0.2568 - val_accuracy: 0.8941
Epoch 15/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3071 - accuracy: 0.8701 - val_loss: 0.2706 - val_accuracy: 0.8911
Epoch 16/50
141/141 [==============================] - 12s 84ms/step - loss: 0.3114 - accuracy: 0.8696 - val_loss: 0.2503 - val_accuracy: 0.9059
Epoch 17/50
141/141 [==============================] - 12s 85ms/step - loss: 0.2978 - accuracy: 0.8794 - val_loss: 0.2853 - val_accuracy: 0.8896
Epoch 18/50
141/141 [==============================] - 12s 85ms/step - loss: 0.3029 - accuracy: 0.8725 - val_loss: 0.2458 - val_accuracy: 0.9033
Epoch 19/50
141/141 [==============================] - 12s 84ms/step - loss: 0.2988 - accuracy: 0.8721 - val_loss: 0.2713 - val_accuracy: 0.8916
Epoch 20/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2960 - accuracy: 0.8747 - val_loss: 0.2649 - val_accuracy: 0.8926
Epoch 21/50
141/141 [==============================] - 13s 92ms/step - loss: 0.2901 - accuracy: 0.8819 - val_loss: 0.2611 - val_accuracy: 0.8957
Epoch 22/50
141/141 [==============================] - 12s 89ms/step - loss: 0.2879 - accuracy: 0.8821 - val_loss: 0.2497 - val_accuracy: 0.8947
Epoch 23/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2831 - accuracy: 0.8817 - val_loss: 0.2396 - val_accuracy: 0.9069
Epoch 24/50
141/141 [==============================] - 12s 89ms/step - loss: 0.2856 - accuracy: 0.8799 - val_loss: 0.2386 - val_accuracy: 0.9059
Epoch 25/50
141/141 [==============================] - 12s 87ms/step - loss: 0.2834 - accuracy: 0.8817 - val_loss: 0.2472 - val_accuracy: 0.9048
Epoch 26/50
141/141 [==============================] - 12s 88ms/step - loss: 0.3038 - accuracy: 0.8768 - val_loss: 0.2792 - val_accuracy: 0.8835
Epoch 27/50
141/141 [==============================] - 13s 91ms/step - loss: 0.2786 - accuracy: 0.8854 - val_loss: 0.2326 - val_accuracy: 0.9079
Epoch 28/50
141/141 [==============================] - 12s 86ms/step - loss: 0.2692 - accuracy: 0.8846 - val_loss: 0.2325 - val_accuracy: 0.9115
Epoch 29/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2770 - accuracy: 0.8841 - val_loss: 0.2507 - val_accuracy: 0.8972
Epoch 30/50
141/141 [==============================] - 13s 92ms/step - loss: 0.2751 - accuracy: 0.8886 - val_loss: 0.2329 - val_accuracy: 0.9104
Epoch 31/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2902 - accuracy: 0.8785 - val_loss: 0.2901 - val_accuracy: 0.8758
Epoch 32/50
141/141 [==============================] - 13s 94ms/step - loss: 0.2665 - accuracy: 0.8915 - val_loss: 0.2314 - val_accuracy: 0.9089
Epoch 33/50
141/141 [==============================] - 13s 91ms/step - loss: 0.2797 - accuracy: 0.8805 - val_loss: 0.2708 - val_accuracy: 0.8921
Epoch 34/50
141/141 [==============================] - 13s 90ms/step - loss: 0.2895 - accuracy: 0.8799 - val_loss: 0.2332 - val_accuracy: 0.9140
Epoch 35/50
141/141 [==============================] - 13s 93ms/step - loss: 0.2696 - accuracy: 0.8857 - val_loss: 0.2512 - val_accuracy: 0.8972
Epoch 36/50
141/141 [==============================] - 13s 90ms/step - loss: 0.2641 - accuracy: 0.8868 - val_loss: 0.2304 - val_accuracy: 0.9104
Epoch 37/50
141/141 [==============================] - 13s 94ms/step - loss: 0.2675 - accuracy: 0.8895 - val_loss: 0.2706 - val_accuracy: 0.8830
Epoch 38/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2699 - accuracy: 0.8839 - val_loss: 0.2285 - val_accuracy: 0.9053
Epoch 39/50
141/141 [==============================] - 12s 87ms/step - loss: 0.2577 - accuracy: 0.8917 - val_loss: 0.2469 - val_accuracy: 0.9043
Epoch 40/50
141/141 [==============================] - 12s 87ms/step - loss: 0.2547 - accuracy: 0.8948 - val_loss: 0.2205 - val_accuracy: 0.9074
Epoch 41/50
141/141 [==============================] - 12s 86ms/step - loss: 0.2553 - accuracy: 0.8930 - val_loss: 0.2494 - val_accuracy: 0.9038
Epoch 42/50
141/141 [==============================] - 14s 97ms/step - loss: 0.2705 - accuracy: 0.8883 - val_loss: 0.2263 - val_accuracy: 0.9109
Epoch 43/50
141/141 [==============================] - 12s 88ms/step - loss: 0.2521 - accuracy: 0.8926 - val_loss: 0.2319 - val_accuracy: 0.9084
Epoch 44/50
141/141 [==============================] - 12s 84ms/step - loss: 0.2694 - accuracy: 0.8850 - val_loss: 0.2199 - val_accuracy: 0.9109
Epoch 45/50
141/141 [==============================] - 12s 83ms/step - loss: 0.2601 - accuracy: 0.8901 - val_loss: 0.2318 - val_accuracy: 0.9079
Epoch 46/50
141/141 [==============================] - 12s 83ms/step - loss: 0.2535 - accuracy: 0.8917 - val_loss: 0.2342 - val_accuracy: 0.9089
Epoch 47/50
141/141 [==============================] - 12s 84ms/step - loss: 0.2584 - accuracy: 0.8897 - val_loss: 0.2238 - val_accuracy: 0.9089
Epoch 48/50
141/141 [==============================] - 12s 83ms/step - loss: 0.2580 - accuracy: 0.8944 - val_loss: 0.2219 - val_accuracy: 0.9120
Epoch 49/50
141/141 [==============================] - 12s 83ms/step - loss: 0.2514 - accuracy: 0.8895 - val_loss: 0.2225 - val_accuracy: 0.9150
Epoch 50/50
141/141 [==============================] - 12s 83ms/step - loss: 0.2483 - accuracy: 0.8977 - val_loss: 0.2370 - val_accuracy: 0.9084

The history-plot looks like this:

The prediction is done with this code:

model = tf.keras.models.load_model(global_dir + "/Model/1")

image = cv.resize(image,(50,50))    
image= image.astype('float32')/255

image= np.expand_dims(image, axis=0)

predictions = model.predict(image)
top = np.array(tf.argmax(predictions, 1))

result = top[0]

This functions collects all the input images and saves the classification (0,1) and then shuffles the array. After that, I cycle through the array, predict the image and compare the result with the actual class.

def test_model():
    dir_good = os.fsencode(global_dir + "/Contours/Clover")
    dir_bad = os.fsencode(global_dir + "/Contours/Grass")
    test = []
    for file2 in os.listdir(dir_good):
        filename2 = os.fsdecode(file2)
        if (filename2.endswith(".jpg")):
            test.append([0,os.path.join(global_dir + "/Contours/Clover", filename2)])
    for file2 in os.listdir(dir_bad):
        filename2 = os.fsdecode(file2)
        if (filename2.endswith(".jpg")):
            test.append([1,os.path.join(global_dir + "/Contours/Grass", filename2)])

    random.shuffle(test)
    count = 0
    right = 0
    for i in range(0,len(test)):
        tmp = cv.imread(test[i][1])
        result = predict_image(tmp) #<--- this function is already quoted above
        count += 1
        right += (1 if result == test[i][0] else 0)
        print(str(test[i][0]) + "->" + str(result),count,right,round(right/count*100,1))

Thank you in advance! Cheers, Seb

This is a very broad question since you are seeking answers to the fundamentals on deep learning and not on programming. Maybe stack-sites like _Cross Validated_ or _Data Science_ are better suited for this question. — Markus, Jun 09 '20 at 21:52
When fitting your model, the training and validation accuracy are both hovering at around 90% which doesn't indicate overfitting of any kind. You mentioned that you got a 63% accuracy by predicting each image one at a time and aggregating the results that way. Please show us that code. That is most likely the culprit in the discrepancy you are facing. BTW, you don't need to use `tf.argmax`. You can use `np.argmax` as the output of the model will be a NumPy array. Keras is nice enough to do that for us. — rayryeng, Jun 10 '20 at 02:20
Hi @rayryeng, I added the code for the model testing at the end of the post. I already confirmed, that the shuffle command does not mix up the class/image connection. I let the process run over all existing files now, and I got an accuracy of 77.8%. That is higher than expected, but not the stated ~90% of the model. — seb2010, Jun 10 '20 at 06:16
@seb2010 `cv2.imread` reads in images in BGR format. Internally, Keras data generators load in images in RGB format. You must reverse the channels prior to inference: `tmp = tmp[...,::-1]` — rayryeng, Jun 10 '20 at 06:31
@rayryeng: Jesus! I cannot believe that this was the cause for this. I would have expected, that this kind of wrong input woud lead to much lower prediction accuracys. With your fix, I am getting the expected hit-rate. Thank you very much!! — seb2010, Jun 10 '20 at 06:35
@seb2010 haha no problem. May I write an answer to close this off? — rayryeng, Jun 10 '20 at 06:35

score 1 · Accepted Answer · answered Jun 10 '20 at 06:36

1

As stated in our conversation, you are using cv2.imread to load in images which loads in the colour channels in BGR format. Keras data generators load in images internally in RGB format. You must reverse the channels prior to inference:

tmp = tmp[...,::-1]

answered Jun 10 '20 at 06:36

rayryeng

102,964
22
184
193

1

This was the solution. Thank you – seb2010 Jun 10 '20 at 06:45

score 0 · Answer 2 · answered Jun 09 '20 at 21:38

0

Well, it seems you bumped into the class overfitting issue. You can diagnose this by looking at the plots os the loss function over the training and validation batches after the model is trained.

import matplotlib.pyplot as plt

plt.plot(history['loss'])
plt.plot(history['val_loss'])

Bunch of possible fixes but it would depend on that diagnosis from the above. See this amazing answer about overfitting.

answered Jun 09 '20 at 21:38

parsethis

7,998
3
29
31

I already have dropout to avoid overfitting and the acc and val_acc curves are increasing nearly parallel as well as the loss and val_loss curves are decreasing. I cannot use plt due to environment corruption but I added the training output above, where you can read the acc and loss values. Do they seem suspicious? And: how is an overfitted model able to have a 90% accuracy on 2.000 validation images? – seb2010 Jun 09 '20 at 21:48
managed to plot the history-results. I pasted the history-plot in my question above – seb2010 Jun 09 '20 at 22:11
Interesting. From the plots it doesn't seem like the model is overfit. At this point, I'd look for bugs in the error computation logic / make sure the preprocessing is the same for the images that are being used for the test .. etc. Good hunting. – parsethis Jun 09 '20 at 22:18

Keras prediction accuracy does not match training accuracy

2 Answers2