2

I have been working on an image classifier and I would like to have a look at the images that the model has misclassified in the validation. My idea was to compare the true and predicted values and used the index of the values that didn't match to get the images. However, when I tried to compare the accuracy I don't get the same result I got when I use the evaluate method. This is what I have done:

I import the data using this function:

def create_dataset(folder_path, name, split, seed, shuffle=True):
  return tf.keras.preprocessing.image_dataset_from_directory(
    folder_path, labels='inferred', label_mode='categorical', color_mode='rgb',
    batch_size=32, image_size=(320, 320), shuffle=shuffle, interpolation='bilinear',
    validation_split=split, subset=name, seed=seed)

train_set = create_dataset(dir_path, 'training', 0.1, 42)
valid_set = create_dataset(dir_path, 'validation', 0.1, 42)

# output:
# Found 16718 files belonging to 38 classes.
# Using 15047 files for training.
# Found 16718 files belonging to 38 classes.
# Using 1671 files for validation.

Then to evaluate the accuracy on the validation set I use this line:

model.evaluate(valid_set)

# output:
# 53/53 [==============================] - 22s 376ms/step - loss: 1.1322 - accuracy: 0.7349
# [1.1321837902069092, 0.7348892688751221]

which is fine since the values are exactly the same I got in the last epoch of training.

To extract the true labels from the validation set I use this line of code based on this answer. Note that I need to create the validation again because every time I call the variable that refers to the validation set, the validation set gets shuffled. I thought that it was this factor to cause the inconsistent accuracy, but apparently it didn't solve the problem.

y_val_true = np.concatenate([y for x, y in create_dataset(dir_path, 'validation', 0.1, 42)], axis=0)
y_val_true = np.argmax(y_val_true, axis=1)

I make the prediction:

y_val_pred = model.predict(create_dataset(dir_path, 'validation', 0.1, 42))
y_val_pred = np.argmax(y_val_pred, axis=1)

And finally I compute once again the accuracy to verify that everything is ok:

m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()

# output:
# 0.082585275

As you can see, instead of getting the same value I got when I ran the evaluate method, now I get only 8%.

I would be truly grateful if you could point out where my approach is flawed. And since the my first question I post, I apologize in advance for any mistake I made.

mp97
  • 48
  • 6
  • Does this answer your question? https://stackoverflow.com/a/65346147/9215780 – Innat Mar 20 '21 at 19:24
  • Yes and no. This is the same approach I have been trying to implement, but in my case I need to get the labels out of the tf dataset and when I try to do it the labels get shuffled. As a result, the true labels and the predicted ones don't match. That's why I get the incorrect accuracy value. Or at least it's what if think it's happeking. Anyway, thanks a lot for the answer @M.Innat – mp97 Mar 20 '21 at 20:56

1 Answers1

1

This method can help provide giving insights if you want to display or analyse batch-by-batch

m = tf.keras.metrics.Accuracy()

# Iterating over individual batches to keep track of the images
# being fed to the model.
for valid_images, valid_labels in valid_set.as_numpy_iterator():
    y_val_true = np.argmax(valid_labels, axis=1)

    # Model can take inputs other than dataset as well. Hence, after images
    # are collected you can give them as input.
    y_val_pred = model.predict(valid_images)
    y_val_pred = np.argmax(y_val_pred, axis=1)
   
    # Update the state of the accuracy metric after every batch
    m.update_state(y_val_true, y_val_pred)

m.result().numpy()

If you want to feed altogether

valid_ds = create_dataset(dir_path, 'validation', 0.1, 42, shuffle=False)
y_val_true = np.concatenate([y for x, y in valid_ds, axis=0)
y_val_true = np.argmax(y_val_true, axis=1)
y_val_pred = model.predict(valid_ds)
y_val_pred = np.argmax(y_val_pred, axis=1)

m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()

I couldn't find the bug in your code though.

ranka47
  • 995
  • 8
  • 25
  • thank you for answering my question. The first block of code you posted definitely solved my problem. I didn't realize I could compute the accuracy batch by batch and this approach overcome the mismatch issue. Regarding the second approach, unfortunately I had already tested it and it doesn't work since I need to shuffle the data set, otherwise I get poor performances. Besides, if you don't use np.argmax, you need to use tf.keras.metrics.CategoricalAccuracy to correctly measure the accuracy. Bye and thank you – mp97 Mar 22 '21 at 13:20
  • Thanks for pointing it out. I forgot to add `argmax`. I understood the issue with `shuffle` while training, however, what is the issue when validating? If the model trained well then it should not give poor performance if `shuffle` is False. – ranka47 Mar 22 '21 at 13:50
  • I perfectly agree with you, but for some reasons I haven't been able to understand, this procedure doesn't work. If I create the validation set with `shuffle=False`, then it only contains instances from the last classes in the overall set. I found this result quite strange given the fact that I keep `shuffle=True` for the training set. I assume there is a mistake in the way I implemented the code. – mp97 Mar 24 '21 at 20:20