Getting different results from Keras model.evaluate and model.predict

Question

I have trained a model to predict topic categories using word2vec and an lstm model using keras and got about 98% accuracy during training, I saved the model then loaded it into another file for trying on the test set, I used model.evaluate and model.predict and the results were very different.

I'm using keras with tensorflow as a backend, the model summary is:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 22)                19624     
_________________________________________________________________
dropout_1 (Dropout)          (None, 22)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 40)                920       
_________________________________________________________________
activation_1 (Activation)    (None, 40)                0         
=================================================================
Total params: 20,544
Trainable params: 20,544
Non-trainable params: 0
_________________________________________________________________
None

The code:

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.load_weights(os.path.join('model', 'lstm_model_weights.hdf5'))
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)

print()
print('Score: %1.4f' % score)
print('Evaluation Accuracy: %1.2f%%' % (acc*100))

predicted = model.predict(x_test, batch_size=batch_size)
acc2 = np.count_nonzero(predicted.argmax(1) == y_test.argmax(1))/y_test.shape[0]
print('Prediction Accuracy: %1.2f%%' % (acc2*100))

The output of this code is

39680/40171 [============================>.] - ETA: 0s  
Score: 0.1192
Evaluation Accuracy: 97.50%
Prediction Accuracy: 9.03%

Can anyone tell me what did I miss?

What is the last layer's activation? Why are using binary cross entropy loss when your output layer has 40 neurons? — Mitiku, Jul 26 '19 at 06:50
Actually that was it, I changed the loss and forgot about it, after changing it to categorical_crossentropy I got the same accuracy for both, thanks @Mitiku. Please confirm if you saw my comment because I will probably delete the post — Mina Melek, Jul 26 '19 at 14:25
This question might be useful if some one might face similar problem in the future. Please consider that before deleting the question. — Mitiku, Jul 26 '19 at 14:31

score 0 · Answer 1 · answered Jul 21 '22 at 21:16

0

I think model evaluation works on dev set (or average of the dev-set accuracy if you use cross-validation), but prediction works on test set.

answered Jul 21 '22 at 21:16

Arefeh Yavary

82
3

if you take a look at the code, you will find that both model.predict and model.evaluate work on the test dataset. – Fang WU Sep 22 '22 at 08:30

Getting different results from Keras model.evaluate and model.predict

1 Answers1

Linked