I get a very strange behavior when comparing model.evaluate()
and model.predict()
results. As you can see in the screenshot I get ~0.926
f1 for the precision and recall returned from model.evaluate()
but for the predictions made by model.predict()
the f1 is much lower. Any ideas how this could happen?
This only happens for the evaluation of an out of sample dataset. For the test-data used during training as validation data, the model.evaluate()
and model.predict()
give the same f1.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])