1

The tensorflow tutorial at this link (see "train the model" section) shows that it is possible to provide a dataset (data, label) to tf.keras.model.fit. However, when I assign my dataset to val_data, as shown below, the loss on the validation data is 0, irrespectively of the state of the training (see graph below). ¨

history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=EPOCHS,
        verbose=1,
    )

IMAGE: Result of training from history, the validation loss is stuck at 0

This issue shows that this problem happens when the val_data is presented in the form of an array [x_val, y_val], instead of a tuple (x_val, y_val). However, in my case, I am providing a dataset, exactly as the tensorflow tutorial above does.

I am using tensorflow 2.1.

My dataset contains tuples structured as described below:

(<tf.Tensor: shape=(batch_size, 128, 128, 3), dtype=uint8, numpy=
array>, <tf.Tensor: shape=(batch_size, 10), dtype=float32, numpy=
array>)

The first is the image, and the second is the label (in this case I have 10 values per image as label).

Does anyone know why I get constant 0 validation loss in the history? How can I make it work? --> see update below

UPDATE:

It seems that this problem only happens when the validation batch size is different than the training batch size. I also noticed that for some batch sizes the validation loss is not exactly zero, but is still close to zero and much lower than the training loss. If the batch size of the validation dataset is set as equal to the batch size of the training dataset everything works correctly. I am now wondering why this happens.

hammockman
  • 37
  • 1
  • 8
  • try to feed train_ds as validation_data, val_ds as training data – Andrey Nov 04 '20 at 20:19
  • Hi @Andrey, thank you for the smart idea. Unfortunately, the same problem persists. The training loss (obtained using the val_ds) changes as expected, but the validation loss (obtained using train_ds) is stuck to 0. Therefore the issue is not in the data but in the way the tf.keras.model.fit() reads the val_data input. – hammockman Nov 05 '20 at 12:10

1 Answers1

0

I found out the problem!

I was using a custom loss function that returns the sum of the losses computed on each element of the batch. For example, having a batch_size of 100, the loss function would return the sum of all these 100 losses. Since I was using a validation dataset with batch_size of 1 (much smaller of the training batch_size), the loss on the validation was much smaller than the loss on the training dataset!

原来如此! :)

hammockman
  • 37
  • 1
  • 8