0

I encountered a strange thing in a testing-on-training experiment, where the val_loss is completely different than the training loss, even though they are evaluated on the exact same data (X,Y) with the same batch_size

Below is the code that I used to train one batch

X, Y = valid_datagen.next()
batch_size = len(X[0])
joint_model.fit( X, Y, 
                 batch_size=batch_size,
                 epochs=1,
                 verbose=1, 
                 validation_data=(X, Y))

Train on 12 samples, validate on 12 samples Epoch 1/1 12/12 [==============================] - 38s 3s/step - loss: 0.7510 - q_mask_a_loss: 0.4739 - r_mask_a_loss: 0.6610 - q_mask_b_loss: 0.4718 - r_mask_b_loss: 0.3164 - pred_a_loss: 1.8092 - pred_b_loss: 0.2238 - q_mask_a_F1: 0.8179 - r_mask_a_F1: 0.5318 - q_mask_b_F1: 0.8389 - r_mask_b_F1: 0.6134 - pred_a_acc: 0.0833 - pred_b_acc: 1.0000 - val_loss: 7.0257 - val_q_mask_a_loss: 6.9748 - val_r_mask_a_loss: 14.9849 - val_q_mask_b_loss: 6.9748 - val_r_mask_b_loss: 14.9234 - val_pred_a_loss: 0.6919 - val_pred_b_loss: 0.6944 - val_q_mask_a_F1: 0.0000e+00 - val_r_mask_a_F1: 0.0000e+00 - val_q_mask_b_F1: 0.0000e+00 - val_r_mask_b_F1: 0.0000e+00 - val_pred_a_acc: 1.0000 - val_pred_b_acc: 0.0000e+00

Note:

  1. the training loss is 0.7510 while the val_loss is 7.0257.
  2. I've already made the batch_size to be equal the number of samples, i.e. training only on one batch.
  3. I am using keras 2.2.0 with tensorflow backend 1.5.0.
  4. using joint_model.evaluate( X, Y, batch_size=batch_size) gives the same result as the validation.

With regard to the used joint_model, it is nothing but a feed-forward CNN with frozen weights in the first several layers. No Dropout layer anywhere.

I've completely no idea what's going on here. Does anyone what are potential reasons or how to debug this? Any suggestions are welcome.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
pitfall
  • 2,531
  • 1
  • 21
  • 21
  • Is there any batch normalization in the model? – Abhishek Mishra Jun 24 '18 at 05:11
  • I can look into your model if you could provide somehow. – Abhishek Mishra Jun 24 '18 at 05:13
  • @Abhishek Mishra Yes. I do have several `BatchNormalization` layers, but same behaviors are observed no matter I set their weights `trainable=True` or `False`. I am now making a google colab notebook for you. Once it is done, I will update its link here. – pitfall Jun 24 '18 at 05:22
  • This might be useful: [ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data](https://stackoverflow.com/questions/47157526/resnet-100-accuracy-during-training-but-33-prediction-accuracy-with-the-same) – desertnaut Jun 24 '18 at 12:39

0 Answers0