keras fit vs keras evaluate

Question

There should be someone who can really clarify this..

Here is some initial info from the Keras documentation: The fit function in the Keras just trains the model for a given number of epochs. And the evaluate function returns the loss value & metrics values for the model in test mode.

So, both functions return a loss. For the sake of giving an example, if I have 1 single training example, the loss I get from the fit function after each training step should be identical to the loss I get from the evaluate function (after the same training step). (The assumption here is that I run both the fit and evaluate functions on the same train set (which consists of 1 example only).)

I define my network as following:

def identity_loss(y_true, y_pred):
    return K.mean(y_pred - 0 * y_true)

model = ResNet50(weights='imagenet')
model.layers.pop()
x = model.get_layer('flatten_1').output # layer 'flatten_1' is the last layer of the model
model_out = Dense(128, activation='relu',  name='model_out')(x)
model_out = Lambda(lambda  x: K.l2_normalize(x,axis=-1))(model_out)

new_model = Model(inputs=model.input, outputs=model_out)

anchor_input = Input(shape=(224, 224, 3), name='anchor_input')
pos_input = Input(shape=(224, 224, 3), name='pos_input')
neg_input = Input(shape=(224, 224, 3), name='neg_input')

encoding_anchor   = new_model(anchor_input)
encoding_pos      = new_model(pos_input)
encoding_neg      = new_model(neg_input)

loss = Lambda(triplet_loss)([encoding_anchor, encoding_pos, encoding_neg])
siamese_network = Model(inputs  = [anchor_input, pos_input, neg_input], 
                        outputs = loss) 
siamese_network.compile(loss=identity_loss, optimizer=Adam(lr=.00003))

Later on, I train my train set (consisting of 1 example only) with the fit function for 10 epochs. Just to check differences between fit and evaluate functions, I also run the evaluate function right after the fit function in each epoch and the output looks like the folloing:

nr_epoch:  0 

Epoch 1/1
1/1 [==============================] - 4s 4s/step - loss: 2.0035
1/1 [==============================] - 3s 3s/step
eval_score for train set:  2.0027356147766113

nr_epoch:  1 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.9816
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.001833915710449

nr_epoch:  2 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.9601
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.00126576423645

nr_epoch:  3 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.9388
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.0009117126464844

nr_epoch:  4 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.9176
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.000725746154785

nr_epoch:  5 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.8964
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.0006520748138428

nr_epoch:  6 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.8759
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.0006656646728516

nr_epoch:  7 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.8555
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.0007567405700684

nr_epoch:  8 

Epoch 1/1
1/1 [==============================] - 1s 1s/step - loss: 1.8355
1/1 [==============================] - 1s 1s/step
eval_score for train set:  2.0009000301361084

nr_epoch:  9 

Epoch 1/1
1/1 [==============================] - 2s 2s/step - loss: 1.8159
1/1 [==============================] - 2s 2s/step
eval_score for train set:  2.001085042953491

As seen, the loss reported by the fit function (at the end of eachepoch) is decreasing. And the loss coming from the evaluate function is just not decreasing.

So the dilemma is: If I run my model on 1 single training example, should I not see the same loss (after each epoch) from both fit and evaluate functions from the same epoch? If I keep training, the train loss is decreasing but the loss coming from the evaluate function somehow remains at the same level and does not decrease

And lastly, here is how I call fit and evaluate functions:

z = np.zeros(len(anchor_path))

siamese_network.fit(x=[anchor_imgs, pos_imgs, neg_imgs], 
                    y=z, 
                    batch_size=batch_size, 
                    epochs=1, 
                    verbose=1, 
                    callbacks=None, 
                    validation_split=0.0, 
                    validation_data=None, 
                    shuffle=True, 
                    class_weight=None, 
                    sample_weight=None, 
                    initial_epoch=0, 
                    steps_per_epoch=None, 
                    validation_steps=None)

eval_score = siamese_network.evaluate(x=[anchor_imgs, pos_imgs, neg_imgs], 
                                      y=z,
                                      batch_size = batch_size, 
                                      verbose = 1)
print('eval_score for train set: ', eval_score)

So, why does the loss decrease during the execution of fit function but not the evaluate function? Where am I making the mistake?

Some layers behave differently during training and inference. Most notably, dropout is turned of in the latter case. — Eli Korvigo, Jul 07 '18 at 07:02
Thank you for your answer. I do not touch the dropout settings anywhere. Is there anything specific you would recommend to double check? The training loss is decreasing rapidly but the loss reported by the evaluate function is almost not changing at all. And there is only 1 single training example. It does not make sense to me... — edn, Jul 07 '18 at 07:12
possibly relevant: [ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data](https://stackoverflow.com/questions/47157526/resnet-100-accuracy-during-training-but-33-prediction-accuracy-with-the-same) — desertnaut, Jul 07 '18 at 11:33

score 1 · Answer 1 · answered Jul 07 '18 at 07:27

1

ResNet uses batch normalization, which doesn't behave the same during training and testing. Your assumption that you should get the same training loss from model.fit and model.evaluate is incorrect.

answered Jul 07 '18 at 07:27

Dr. Snoopy

55,122
7
121
140

Thank you for your answer. I see your point. But what is the solution to this? How can I match losses from fit and evaluate functions (or at least make them as close as possible to each other)? (A side note: Not mentioned above but I run the following "layer.trainable = False" and freeze the weights of original ResNet50. Can this affect it as well?) – edn Jul 07 '18 at 07:34
If you freeze the BN layers then it's definitely the same problem with broken BN, which is described in the links you posted in your answer. – Andrey Kite Gorin Jul 07 '18 at 08:49
@edn I don't think there is a problem, why do you want to have matching losses between fit and evaluate? – Dr. Snoopy Jul 07 '18 at 11:34
Yes, I recognize that there is no meaning at freezing BN layer parameters when implementing transfer learning. Regarding @MatiasValdenegro's question: I first trained my model and saw a huge diff in losses from fit and evaluate functions. Then I just trained the model with 1 single example (to check everything is working as it should be) but the diff between the losses was still there. Otherwise, I would not try to match losses. – edn Jul 07 '18 at 16:05

score 0 · Accepted Answer · answered Jul 07 '18 at 07:55

With further research (by googling it with different keywords), I found the following information which also provides solutions. Seemingly, many people have been suffering from thiis problem, particularly when trying to utilize transfer learning.

Here is a discussion and a solution to the problem: Strange behaviour of the loss function in keras model, with pretrained convolutional base

And here is a blogpost about this topic: http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/

I unfortunately think both Tensorflow and Keras have quite terrible documentations.

keras fit vs keras evaluate

2 Answers2