44

I am using Keras with TensorFlow backend to train CNN models.

What is the between model.fit() and model.evaluate()? Which one should I ideally use? (I am using model.fit() as of now).

I know the utility of model.fit() and model.predict(). But I am unable to understand the utility of model.evaluate(). Keras documentation just says:

It is used to evaluate the model.

I feel this is a very vague definition.

nbro
  • 15,395
  • 32
  • 113
  • 196
Abhijit Balaji
  • 1,870
  • 4
  • 17
  • 40
  • Since originally asked, a lot has happened, including the docs significantly improving; so I'll include a link here to the Keras API for Tensorflow 2.x Python API for "Model": `compile` (Configures the model for training); `fit` (Trains the model for a fixed number of epochs); `evaluate` (Returns the loss value & metrics values for the model in test mode); `predict` (Generates output predictions for the input samples) https://www.tensorflow.org/api_docs/python/tf/keras/Model – michael Oct 05 '20 at 06:49

7 Answers7

66

fit() is for training the model with the given inputs (and corresponding training labels).

evaluate() is for evaluating the already trained model using the validation (or test) data and the corresponding labels. Returns the loss value and metrics values for the model.

predict() is for the actual prediction. It generates output predictions for the input samples.

Let us consider a simple regression example:

# input and output
x = np.random.uniform(0.0, 1.0, (200))
y = 0.3 + 0.6*x + np.random.normal(0.0, 0.05, len(y))

enter image description here

Now lets apply a regression model in keras:

# A simple regression model
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.compile(loss='mse', optimizer='rmsprop')

# The fit() method - trains the model
model.fit(x, y, nb_epoch=1000, batch_size=100)

Epoch 1000/1000
200/200 [==============================] - 0s - loss: 0.0023

# The evaluate() method - gets the loss statistics
model.evaluate(x, y, batch_size=200)     
# returns: loss: 0.0022612824104726315

# The predict() method - predict the outputs for the given inputs
model.predict(np.expand_dims(x[:3],1)) 
# returns: [ 0.65680361],[ 0.70067143],[ 0.70482892]
nbro
  • 15,395
  • 32
  • 113
  • 196
Vijay Mariappan
  • 16,921
  • 3
  • 40
  • 59
  • even model.fit() returns the loss and acc right? I am bit fuzzy with model.fit() and model.evaluate() – Abhijit Balaji Jun 30 '17 at 10:27
  • fit() is for training a model. It produces metrics for the training set, where as evaluate() is for a testing the trained model on the test set. – Vijay Mariappan Jun 30 '17 at 11:36
  • What does `evaluate()` return if we don't pass any `x` and `y` parameters? As far as I am aware their default values are `None` which means they are not necessary. – Chhaganlaal Jun 25 '20 at 14:41
  • The fit() method also allows passing in the validation dataset with the training dataset. So if you train the model passing in both in the fit() method, is it expected that the accuracy and loss values will be same when you run the evaluate() method on the validation dataset after the training is done? I'm just curious – ptn77 Apr 10 '22 at 01:43
  • @vijayachandran-mariappan I have gotten the `recall`, `accuracy`, `auc`, and `precision` from the keras compile block. It's really handy. How do we access each metric separately so we can use them? It seems they are not accessible via dot syntax or subscripting. – Edison Jul 11 '22 at 12:47
12

In Deep learning you first want to train your model. You take your data and split it into two sets: the training set, and the test set. It seems pretty common that 80% of your data goes into your training set and 20% goes into your test set.

Your training set gets passed into your call to fit() and your test set gets passed into your call to evaluate(). During the fit operation a number of rows of your training data are fed into your neural net (based on your batch size). After every batch is sent the fit algorithm does back propagation to adjust the weights in your neural net.

After this is done your neural net is trained. The problem is sometimes your neural net gets overfit which is a condition where it performs well for the training set but poorly for other data. To guard against this situation you run the evaluate() function to send new data (your test set) through your neural net to see how it performs with data it has never seen. There is no training occurring, this is purely a test. If all goes well then the score from training is similar to the score from testing.

rancidfishbreath
  • 3,944
  • 2
  • 30
  • 44
  • Imagine that I have done fit + evaluate + predict with data set #1, and now I want to predict with data set #2 without doing any fit, is it possible? Do I still need to evaluate again too?. – bardulia Jul 17 '23 at 14:45
4

fit(): Trains the model for a given number of epochs (this is for training time, with the training dataset).

predict(): Generates output predictions for the input samples (this is for somewhere between training and testing time).

evaluate(): Returns the loss value & metrics values for the model in test mode (this is for testing time, with the testing dataset).

tyncho08
  • 59
  • 1
  • 6
2

While all the above answers explain what these functions : fit(), evaluate() or predict() do however more important point to keep in mind in my opinion is what data you should use for fit() and evaluate().

The most clear guideline that I came across in Machine Learning Mastery and particular quote in there:

Training set: A set of examples used for learning, that is to fit the parameters of the classifier.

Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.

Test set: A set of examples used only to assess the performance of a fully-specified classifier.

: By Brian Ripley, page 354, Pattern Recognition and Neural Networks, 1996

You should not use the same data that you used to train(tune) the model (validation data) for evaluating the performance (generalization) of your fully trained model (evaluate).

The test data used for evaluate() should be unseen/not used for training(fit()) in order to be any reliable indicator of model evaluation (for generlization).

For Predict() you can use just one or few example(s) that you choose (from anywhere) to get quick check or answer from your model. I don't believe it can be used as sole parameter for generalization.

Amsci Fi
  • 33
  • 3
  • Is there any reason to use `evaluate()`? Since in `fit()` you can already pass the validation dataset. And in `predict()` we would use the test dataset? – Murilo Dec 03 '21 at 14:19
2

One thing which was not mentioned here, I believe needs to be specified. model.evaluate() returns a list which contains a loss figure and an accuracy figure. What has not been said in the answers above, is that the "loss" figure is the sum of ALL the losses calculated for each item in the x_test array. x_test would contain your test data and y_test would contain your labels. It should be clear that the loss figure is the sum of ALL the losses, not just one loss from one item in the x_test array.

0

I would say the mean of losses incurred from all iterations, not the sum. But sure, that's the most important information here, otherwise the modeler would be slightly confused.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 05 '22 at 03:23
0

As per the documentation, model.evaluate() returns the test loss (lower is better) and "metrics".

Metrics over here refers to all metrics which were included when the model was compiled.

To get that info for your particular model, you can print(model.metric_names).

(These metrics are not necessarily going to be included in every model though.)

Here's a list of metrics you can include when compiling the model.

Fit is the function you use after compilation in order to train the model.

Evaluate is only used once you are done training.

DonCarleone
  • 544
  • 11
  • 20