1

I am trying to match the accuracy of a model.predict call to the final val_accuracy of model.fit(). I am using tf dataset.

val_ds = tf.keras.utils.image_dataset_from_directory(
    'my_path',
    validation_split=0.2,
    subset="validation",
    seed=38,
    image_size=(SIZE,SIZE),
)

The dataset setup for train_ds is similar. I prefetch both...

train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)

Than I get the labels for the val_ds so I can use them later

true_categories = tf.concat([y for x, y in val_ds], axis=0)

My model

inputs = tf.keras.Input(shape=(SIZE, SIZE, 3))
# ... some other layers
outputs = tf.keras.layers.Dense( len(CLASS_NAMES), activation = tf.keras.activations.softmax)(intermediate)
model = tf.keras.Model(inputs, outputs)

Compiles fine

model.compile(
  optimizer = 'adam', 
  loss=tf.keras.losses.SparseCategoricalCrossentropy(), 
  metrics = ['accuracy'])

Seems to fit fine

history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=10, 
  class_weight=class_weights) #i do weight the classes due to imbalance

The last epoch output

Epoch 10: val_accuracy did not improve from 0.92291 176/176 [==============================] - 191s 1s/step - loss: 0.9876 - accuracy: 0.7318 - val_loss: 0.4650 - val_accuracy: 0.8580

Now I want to verify the val_accuracy == 0.8580 when I run model.predict()

predictions = model.predict(val_ds, verbose=2 ) 
flattened_predictions =  predictions.argmax(axis=1)
accuracy = metrics.accuracy_score(true_categories, flattened_predictions)
print ("Accuracy = ", accuracy)

Accuracy = 0.7980014275517487

I would have expected that to equal the last val accuracy, which was 0.8580, but it is off. My val_ds uses a seed so I should be getting the images in the same order when I shuffle, right? Getting ground truth labels is a pain using datasets, but I think (???) my method is correct.

I only have two classes and when I look at my predictions variable it looks like I am getting probabilities as I would expect, so I think I set up, compiled and fit my model correctly for sparse categorical cross entropy using softmax on my final layer output.

predictions[:3] #show the first 3 predictions, the values sum to 1.0 as expected

array([[0.42447385, 0.5755262 ], [0.2162129 , 0.7837871 ], [0.31917858, 0.6808214 ]], dtype=float32)

What am I missing?

honkskillet
  • 3,007
  • 6
  • 31
  • 47

1 Answers1

4

What you are missing is that your validation dataset is shuffled at every iteration.

tf.keras.utils.image_dataset_from_directory has shuffle=True by default. And that shuffle method for a TensorFlow dataset has an argument reshuffle_each_iteration which is None by default. Therefore it is shuffled everytime.

The seed=38 parameter is used for tracking the samples that reserved for training and validation separately. In other words, with seed argument we can follow which samples will be used for validation dataset and vice versa.

As an example:

dataset = tf.data.Dataset.range(6)
dataset = dataset.shuffle(6, reshuffle_each_iteration=None, seed=154).batch(2)

print("First time iteration:")
for x in dataset:
    print(x)
print("\n")

print("Second time iteration")  
for x in dataset:
    print(x)

This will print:

First time iteration:
tf.Tensor([2 1], shape=(2,), dtype=int64)
tf.Tensor([3 0], shape=(2,), dtype=int64)
tf.Tensor([5 4], shape=(2,), dtype=int64)


Second time iteration
tf.Tensor([4 3], shape=(2,), dtype=int64)
tf.Tensor([0 5], shape=(2,), dtype=int64)
tf.Tensor([2 1], shape=(2,), dtype=int64)

Relevant source code for tf.keras.utils.image_dataset_from_directory can be found here.

If you want to match predictions with their respective labels, then you can loop over the dataset:

predictions = []
labels = []
for x, y in val_ds:
    predictions.append(np.argmax(model(x), axis=-1))
    labels.append(y.numpy())

predictions = np.concatenate(predictions, axis=0)
labels = np.concatenate(labels, axis=0)

Then you can check accuracy.

Frightera
  • 4,773
  • 2
  • 13
  • 28
  • OK. That makes sense. Then how do you use get labels to use with model.predict() when using a tf dataset? – honkskillet Aug 26 '22 at 16:50
  • If you are willing to use `tf.keras.utils.image_dataset_from_directory` you need to set `shuffle=False` or loop over the dataset while collecting both predictions and labels. Otherwise if you build the tf.dataset by yourself then you can use `model.predict()` if you set `reshuffle_each_iteration=False` when using `shuffle()` method. – Frightera Aug 26 '22 at 16:53
  • Thank you. To make my code work I simply set `shuffle=False` when setting up my validation dataset. – honkskillet Aug 26 '22 at 17:55
  • Oops. I don't think that really worked either because now my val_ds, without the shuffle, looks like it is all of one class. I might have to build the tf.dataset by myself as you'd mentioned. – honkskillet Aug 26 '22 at 19:08
  • It should have actually, can you confirm it with `np.unique()`? Also I made a mistake in the loop part, I'll fix it and you can also try that. – Frightera Aug 26 '22 at 19:12
  • Well I left `shuffle = True` for the training set and set `shuffle = False` for the validation. I'm thinking image_dataset_from_directory then gives me a nicely shuffled training set and a block of one class to for validate. It does solve my initial problem because when I run model_predict(val_ds) it looks like I get the same validation dataset in the same order as my last model.fit() epoch, but the for the validation set is all one class. I confirmed with `confusion_matrix()` ==> `[[ 0, 0], [640, 761]]>` – honkskillet Aug 26 '22 at 19:58
  • Apologies, the answer I posted (suggesting that class_weight affected the accuracy metric during training) was incorrect, and I'm now voting to delete it. I'm not sure if I can unilaterally delete it. Thanks to @honkskillet for pointing this out. – David Harris Aug 27 '22 at 06:10