I have been working on an image classifier and I would like to have a look at the images that the model has misclassified in the validation. My idea was to compare the true and predicted values and used the index of the values that didn't match to get the images. However, when I tried to compare the accuracy I don't get the same result I got when I use the evaluate method. This is what I have done:
I import the data using this function:
def create_dataset(folder_path, name, split, seed, shuffle=True):
return tf.keras.preprocessing.image_dataset_from_directory(
folder_path, labels='inferred', label_mode='categorical', color_mode='rgb',
batch_size=32, image_size=(320, 320), shuffle=shuffle, interpolation='bilinear',
validation_split=split, subset=name, seed=seed)
train_set = create_dataset(dir_path, 'training', 0.1, 42)
valid_set = create_dataset(dir_path, 'validation', 0.1, 42)
# output:
# Found 16718 files belonging to 38 classes.
# Using 15047 files for training.
# Found 16718 files belonging to 38 classes.
# Using 1671 files for validation.
Then to evaluate the accuracy on the validation set I use this line:
model.evaluate(valid_set)
# output:
# 53/53 [==============================] - 22s 376ms/step - loss: 1.1322 - accuracy: 0.7349
# [1.1321837902069092, 0.7348892688751221]
which is fine since the values are exactly the same I got in the last epoch of training.
To extract the true labels from the validation set I use this line of code based on this answer. Note that I need to create the validation again because every time I call the variable that refers to the validation set, the validation set gets shuffled. I thought that it was this factor to cause the inconsistent accuracy, but apparently it didn't solve the problem.
y_val_true = np.concatenate([y for x, y in create_dataset(dir_path, 'validation', 0.1, 42)], axis=0)
y_val_true = np.argmax(y_val_true, axis=1)
I make the prediction:
y_val_pred = model.predict(create_dataset(dir_path, 'validation', 0.1, 42))
y_val_pred = np.argmax(y_val_pred, axis=1)
And finally I compute once again the accuracy to verify that everything is ok:
m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()
# output:
# 0.082585275
As you can see, instead of getting the same value I got when I ran the evaluate method, now I get only 8%.
I would be truly grateful if you could point out where my approach is flawed. And since the my first question I post, I apologize in advance for any mistake I made.