ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data

Question

I am new to machine learning and deep learning, and for learning purposes I tried to play with Resnet. I tried to overfit over small data (3 different images) and see if I can get almost 0 loss and 1.0 accuracy - and I did.

The problem is that predictions on the training images (i.e. the same 3 images used for training) are not correct..

Training Images

Image labels

[1,0,0], [0,1,0], [0,0,1]

My python code

#loading 3 images and resizing them
imgs = np.array([np.array(Image.open("./Images/train/" + fname)
                          .resize((197, 197), Image.ANTIALIAS)) for fname in
                 os.listdir("./Images/train/")]).reshape(-1,197,197,1)
# creating labels
y = np.array([[1,0,0],[0,1,0],[0,0,1]])
# create resnet model
model = ResNet50(input_shape=(197, 197,1),classes=3,weights=None)

# compile & fit model
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['acc'])

model.fit(imgs,y,epochs=5,shuffle=True)

# predict on training data
print(model.predict(imgs))

The model does overfit the data:

3/3 [==============================] - 22s - loss: 1.3229 - acc: 0.0000e+00
Epoch 2/5
3/3 [==============================] - 0s - loss: 0.1474 - acc: 1.0000
Epoch 3/5
3/3 [==============================] - 0s - loss: 0.0057 - acc: 1.0000
Epoch 4/5
3/3 [==============================] - 0s - loss: 0.0107 - acc: 1.0000
Epoch 5/5
3/3 [==============================] - 0s - loss: 1.3815e-04 - acc: 1.0000

but predictions are:

 [[  1.05677405e-08   9.99999642e-01   3.95520459e-07]
 [  1.11955103e-08   9.99999642e-01   4.14905685e-07]
 [  1.02637095e-07   9.99997497e-01   2.43751242e-06]]

which means that all images got label=[0,1,0]

why? and how can that happen?

I don't think images are relevent here.. They are just grayscale images, about 110x90 size.. — Dvir Samuel, Nov 07 '17 at 14:10
https://imgur.com/a/XKlKA - 3 images I use to try overfitting — Dvir Samuel, Nov 07 '17 at 14:14
@DvirSamuel. Please post the images individually in your question. — Mad Physicist, Nov 07 '17 at 15:29
@MadPhysicist - thanks to WilmarvanOmmeren the question is updated — Dvir Samuel, Nov 07 '17 at 16:06
`ValueError: Input size must be at least 197x197`... which version of Keras are you using? — grovina, Nov 07 '17 at 16:40
@grovina I use Keras2.0.8, but I don't think the version is important.. make sure you resize the images like I did in the code above — Dvir Samuel, Nov 07 '17 at 16:48
@DvirSamuel the exception raises when trying to declare `ResNet50(input_shape=(140...` — grovina, Nov 07 '17 at 17:03
@grovina you are right!! I am sorry, change 140 to 197 and it should work.. look in the code above :) — Dvir Samuel, Nov 07 '17 at 17:09
Dude - it's not working - it expects image of size `(224, 224, 3)` with `include_top=True`. — Marcin Możejko, Nov 07 '17 at 22:45
Weird.. it works for me.. did you make sure that input_shape is (197,197,1) when creating ResNet50 object? — Dvir Samuel, Nov 08 '17 at 07:44

Yu-Yang · Accepted Answer · 2017-11-09T20:21:52.420

It's because of the batch normalization layers.

In training phase, the batch is normalized w.r.t. its mean and variance. However, in testing phase, the batch is normalized w.r.t. the moving average of previously observed mean and variance.

Now this is a problem when the number of observed batches is small (e.g., 5 in your example) because in the BatchNormalization layer, by default moving_mean is initialized to be 0 and moving_variance is initialized to be 1.

Given also that the default momentum is 0.99, you'll need to update the moving averages quite a lot of times before they converge to the "real" mean and variance.

That's why the prediction is wrong in the early stage, but is correct after 1000 epochs.

You can verify it by forcing the BatchNormalization layers to operate in "training mode".

During training, the accuracy is 1 and the loss is close to zero:

model.fit(imgs,y,epochs=5,shuffle=True)
Epoch 1/5
3/3 [==============================] - 19s 6s/step - loss: 1.4624 - acc: 0.3333
Epoch 2/5
3/3 [==============================] - 0s 63ms/step - loss: 0.6051 - acc: 0.6667
Epoch 3/5
3/3 [==============================] - 0s 57ms/step - loss: 0.2168 - acc: 1.0000
Epoch 4/5
3/3 [==============================] - 0s 56ms/step - loss: 1.1921e-07 - acc: 1.0000
Epoch 5/5
3/3 [==============================] - 0s 53ms/step - loss: 1.1921e-07 - acc: 1.0000

Now if we evaluate the model, we'll observe high loss and low accuracy because after 5 updates, the moving averages are still pretty close to the initial values:

model.evaluate(imgs,y)
3/3 [==============================] - 3s 890ms/step
[10.745396614074707, 0.3333333432674408]

However, if we manually specify the "learning phase" variable and let the BatchNormalization layers use the "real" batch mean and variance, the result becomes the same as what's observed in fit().

sample_weights = np.ones(3)
learning_phase = 1  # 1 means "training"
ins = [imgs, y, sample_weights, learning_phase]
model.test_function(ins)
[1.192093e-07, 1.0]

It's also possible to verify it by changing the momentum to a smaller value.

For example, by adding momentum=0.01 to all the batch norm layers in ResNet50, the prediction after 20 epochs is:

model.predict(imgs)
array([[  1.00000000e+00,   1.34882026e-08,   3.92139575e-22],
       [  0.00000000e+00,   1.00000000e+00,   0.00000000e+00],
       [  8.70998792e-06,   5.31159838e-10,   9.99991298e-01]], dtype=float32)

I tried manually specifying the learning phase without success as explained in [the update section of my question](https://stackoverflow.com/questions/55569181/why-is-accuracy-from-fit-generator-different-to-that-from-evaluate-generator-in). Did I do something wrong? — Sophie Crommelinck, Apr 11 '19 at 12:29
Also could you provide en example for how you changed the `momentum` for all pre-trained batch norm layers in `ResNet50`? Am I getting it right that setting `momentum=0.01` should produce the same results during during training and evaluation? Is this meant just for verification or as a setting for training? — Sophie Crommelinck, Apr 11 '19 at 12:29

Mike Chen · Answer 2 · 2020-10-12T13:41:49.447

Comparing with the EfficientNet(90% accuracy), the ResNet50/101/152 predicts quite a bad result(15~50% accuracy) while adopting the given weights provided by Francios Cholett. It is not related to the weights, but related to the inherent complexity of the above model. In other words, it is necessary to re-train the above model to predict an given image. But EfficientNet does not need such the training to predict an image.

For instance, while given a classical cat image, it shows the final result as follows.

1. Adoption of the decode_predictions

from keras.applications.imagenet_utils import decode_predictions

Predicted: [[('n01930112', 'nematode', 0.122968934), ('n03041632', 'cleaver', 0.04236396), ('n03838899', 'oboe', 0.03846453), ('n02783161', 'ballpoint', 0.027445247), ('n04270147', 'spatula', 0.024508419)]]

2. Adoption of the CV2

img = cv2.resize(cv2.imread('/home/mike/Documents/keras_resnet_common/images/cat.jpg'), (224, 224)).astype(np.float32)

# Remove the train image mean
img[:,:,0] -= 103.939
img[:,:,1] -= 116.779
img[:,:,2] -= 123.68

Predicted: [[('n04065272', 'recreational_vehicle', 0.46529356), ('n01819313', 'sulphur-crested_cockatoo', 0.31684962), ('n04074963', 'remote_control', 0.051597465), ('n02111889', 'Samoyed', 0.040776145), ('n04548362', 'wallet', 0.029898684)]]

Therefore, ResNet50/101/152 models are not suitable to predict an image without training even provided with the weights. But users can feel its value after 100~1000 epochs training for prediction because it helps obtain a better moving average. If users want an easy prediction, EfficientNet is a good choice with the given weights.

score 0 · Answer 3 · answered Oct 12 '20 at 13:22

ResNet50V2 (the 2nd version) has the much higher accuracy than ResNet50in predicting a given image such as the classical Egyptian cat.

Predicted: [[('n02124075', 'Egyptian_cat', 0.8233388), ('n02123159', 'tiger_cat', 0.103765756), ('n02123045', 'tabby', 0.07267675), ('n03958227', 'plastic_bag', 3.6531426e-05), ('n02127052', 'lynx', 3.647774e-05)]]

score 0 · Answer 4 · answered Nov 17 '22 at 17:20

It seems that predicting with a batch of images will not work correctly in Keras. It is better to do prediction for each image individually and then calculate the accuracy manually. As an example, in the following code, I don't use batch prediction, but use individual image prediction.

import os
from PIL import Image
import keras
import numpy

###
# I am not including code to load models or train model
###

print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
    0:'This is Cat',
    1:'This is Dog',
}
for file_name in files:
    total += 1
    image = Image.open(dir + "/" + file_name).convert('RGB')
    image = image.resize((100,100))
    image = numpy.expand_dims(image, axis=0)
    image = numpy.array(image)
    image = image/255
    pred = model.predict_classes([image])[0]
    sign = classes[pred]
    if ("cat" in file_name) and ("cat" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
    elif ("dog" in file_name) and ("dog" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
print("accuracy: ", (correct/total))

score -4 · Answer 5 · answered May 30 '19 at 08:42

What happens is basically that keras.fit() i.e your

model.fit()

is while having the best fit the precision is lost. As, the precision is lost the models fit gives problems and varied results.The keras.fit only has a good fit not the required precision

ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data

5 Answers5

Linked