How to use tfa.metrics.F1Score with image_dataset_from_directory correctly?

Question

Colab code is here:

I am following the docs here to get the result for multiclass prediction

When I train using

#last layer
tf.keras.layers.Dense(2, activation='softmax')

model.compile(optimizer="adam",
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=[tf.keras.metrics.CategoricalAccuracy(),
                       tfa.metrics.F1Score(num_classes=2, average='macro')])

I get

144/144 [==] - 8s 54ms/step - loss: 0.0613 - categorical_accuracy: 0.9789 - f1_score: 0.9788 - val_loss: 0.0826 - val_categorical_accuracy: 0.9725 - val_f1_score: 0.9722

When I do:

model.evaluate(val_ds)

I get

16/16 [==] - 0s 15ms/step - loss: 0.0826 - categorical_accuracy: 0.9725 - f1_score: 0.9722
[0.08255868405103683, 0.9725490212440491, 0.9722140431404114]

I would like to use the metric.result as in the official website. When I load the below code, I get 0.4875028 which is wrong. How can I get the correct predicted_categories and true_categories?

metric = tfa.metrics.F1Score(num_classes=2, average='macro')

predicted_categories = model.predict(val_ds)
true_categories = tf.concat([y for x, y in val_ds], axis=0).numpy() 

metric.update_state(true_categories, predicted_categories)
result = metric.result()
print(result.numpy())

#0.4875028

Here is how I loaded my data

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    main_folder,
    validation_split=0.1,
    subset="training",
    label_mode='categorical',
    seed=123,
    image_size=(dim, dim))

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    main_folder,
    validation_split=0.1,
    subset="validation",
    label_mode='categorical',
    seed=123,
    image_size=(dim, dim))

It's difficult to answer this without knowing what `val_ds` is. — o-90, Mar 01 '21 at 13:18
@gobrewers14 I created a colab :) I hope you are able to find what noob mistake I am making https://colab.research.google.com/drive/1XhVpnjhpvtDq3kjZJ4_vjeAhoYQh9XsR?usp=sharing — Joseph Adam, Mar 01 '21 at 14:14
My [answer](https://stackoverflow.com/questions/66386561/keras-classification-report-accuracy-is-different-between-model-predict-accurac/66425032#66425032) to your other question basically answers this one too. `predict` is shuffling your dataset. — o-90, Mar 01 '21 at 15:48

score 0 · Answer 1 · answered Mar 02 '21 at 07:46

From: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory

tf.keras.preprocessing.image_dataset_from_directory(
    directory, labels='inferred', label_mode='int',
    class_names=None, color_mode='rgb', batch_size=32, image_size=(256,
    256), shuffle=True, seed=None, validation_split=None, subset=None,
    interpolation='bilinear', follow_links=False
)

The shuffle by default is True, and that is a problem for your val_ds, which we do not want to shuffle.

The correct metrics are the ones reported during the training; Also I recommend that you can also manually retrieve your validation dataset and check the metrics once you make predictions on it (not necessarily via flow_from_directory()).

How to use tfa.metrics.F1Score with image_dataset_from_directory correctly?

1 Answers1