How to tell which Keras model is better?

Question

I don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better.

Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one?

There are different accs and val accs for each epoch. How do I know the acc or val acc for my model as a whole? Do I average all of the epochs accs or val accs to find the acc or val acc of the model as a whole?

Model 1 Output

Train on 970 samples, validate on 243 samples
Epoch 1/20
0s - loss: 0.1708 - acc: 0.7990 - val_loss: 0.2143 - val_acc: 0.7325
Epoch 2/20
0s - loss: 0.1633 - acc: 0.8021 - val_loss: 0.2295 - val_acc: 0.7325
Epoch 3/20
0s - loss: 0.1657 - acc: 0.7938 - val_loss: 0.2243 - val_acc: 0.7737
Epoch 4/20
0s - loss: 0.1847 - acc: 0.7969 - val_loss: 0.2253 - val_acc: 0.7490
Epoch 5/20
0s - loss: 0.1771 - acc: 0.8062 - val_loss: 0.2402 - val_acc: 0.7407
Epoch 6/20
0s - loss: 0.1789 - acc: 0.8021 - val_loss: 0.2431 - val_acc: 0.7407
Epoch 7/20
0s - loss: 0.1789 - acc: 0.8031 - val_loss: 0.2227 - val_acc: 0.7778
Epoch 8/20
0s - loss: 0.1810 - acc: 0.8010 - val_loss: 0.2438 - val_acc: 0.7449
Epoch 9/20
0s - loss: 0.1711 - acc: 0.8134 - val_loss: 0.2365 - val_acc: 0.7490
Epoch 10/20
0s - loss: 0.1852 - acc: 0.7959 - val_loss: 0.2423 - val_acc: 0.7449
Epoch 11/20
0s - loss: 0.1889 - acc: 0.7866 - val_loss: 0.2523 - val_acc: 0.7366
Epoch 12/20
0s - loss: 0.1838 - acc: 0.8021 - val_loss: 0.2563 - val_acc: 0.7407
Epoch 13/20
0s - loss: 0.1835 - acc: 0.8041 - val_loss: 0.2560 - val_acc: 0.7325
Epoch 14/20
0s - loss: 0.1868 - acc: 0.8031 - val_loss: 0.2573 - val_acc: 0.7407
Epoch 15/20
0s - loss: 0.1829 - acc: 0.8072 - val_loss: 0.2581 - val_acc: 0.7407
Epoch 16/20
0s - loss: 0.1878 - acc: 0.8062 - val_loss: 0.2589 - val_acc: 0.7407
Epoch 17/20
0s - loss: 0.1833 - acc: 0.8072 - val_loss: 0.2613 - val_acc: 0.7366
Epoch 18/20
0s - loss: 0.1837 - acc: 0.8113 - val_loss: 0.2605 - val_acc: 0.7325
Epoch 19/20
0s - loss: 0.1906 - acc: 0.8010 - val_loss: 0.2555 - val_acc: 0.7407
Epoch 20/20
0s - loss: 0.1884 - acc: 0.8062 - val_loss: 0.2542 - val_acc: 0.7449

Model 2 Output

Train on 970 samples, validate on 243 samples
Epoch 1/20
0s - loss: 0.1735 - acc: 0.7876 - val_loss: 0.2386 - val_acc: 0.6667
Epoch 2/20
0s - loss: 0.1733 - acc: 0.7825 - val_loss: 0.1894 - val_acc: 0.7449
Epoch 3/20
0s - loss: 0.1781 - acc: 0.7856 - val_loss: 0.2028 - val_acc: 0.7407
Epoch 4/20
0s - loss: 0.1717 - acc: 0.8021 - val_loss: 0.2545 - val_acc: 0.7119
Epoch 5/20
0s - loss: 0.1757 - acc: 0.8052 - val_loss: 0.2252 - val_acc: 0.7202
Epoch 6/20
0s - loss: 0.1776 - acc: 0.8093 - val_loss: 0.2449 - val_acc: 0.7490
Epoch 7/20
0s - loss: 0.1833 - acc: 0.7897 - val_loss: 0.2272 - val_acc: 0.7572
Epoch 8/20
0s - loss: 0.1827 - acc: 0.7928 - val_loss: 0.2376 - val_acc: 0.7531
Epoch 9/20
0s - loss: 0.1795 - acc: 0.8062 - val_loss: 0.2445 - val_acc: 0.7490
Epoch 10/20
0s - loss: 0.1746 - acc: 0.8103 - val_loss: 0.2491 - val_acc: 0.7449
Epoch 11/20
0s - loss: 0.1831 - acc: 0.8082 - val_loss: 0.2477 - val_acc: 0.7449
Epoch 12/20
0s - loss: 0.1831 - acc: 0.8113 - val_loss: 0.2496 - val_acc: 0.7490
Epoch 13/20
0s - loss: 0.1920 - acc: 0.8000 - val_loss: 0.2459 - val_acc: 0.7449
Epoch 14/20
0s - loss: 0.1945 - acc: 0.7928 - val_loss: 0.2446 - val_acc: 0.7490
Epoch 15/20
0s - loss: 0.1852 - acc: 0.7990 - val_loss: 0.2459 - val_acc: 0.7449
Epoch 16/20
0s - loss: 0.1800 - acc: 0.8062 - val_loss: 0.2495 - val_acc: 0.7449
Epoch 17/20
0s - loss: 0.1891 - acc: 0.8000 - val_loss: 0.2469 - val_acc: 0.7449
Epoch 18/20
0s - loss: 0.1891 - acc: 0.8041 - val_loss: 0.2467 - val_acc: 0.7531
Epoch 19/20
0s - loss: 0.1853 - acc: 0.8072 - val_loss: 0.2511 - val_acc: 0.7449
Epoch 20/20
0s - loss: 0.1905 - acc: 0.8062 - val_loss: 0.2460 - val_acc: 0.7531

score 64 · Accepted Answer · answered Jan 10 '16 at 12:05

64

Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one?

If you want to estimate the ability of your model to generalize to new data (which is probably what you want to do), then you look at the validation accuracy, because the validation split contains only data that the model never sees during the training and therefor cannot just memorize.

If your training data accuracy ("acc") keeps improving while your validation data accuracy ("val_acc") gets worse, you are likely in an overfitting situation, i.e. your model starts to basically just memorize the data.

There are different accs and val accs for each epoch. How do I know the acc or val acc for my model as a whole? Do I average all of the epochs accs or val accs to find the acc or val acc of the model as a whole?

Each epoch is a training run over all of your data. During that run the parameters of your model are adjusted according to your loss function. The result is a set of parameters which have a certain ability to generalize to new data. That ability is reflected by the validation accuracy. So think of every epoch as its own model, which can get better or worse if it is trained for another epoch. Whether it got better or worse is judged by the change in validation accuracy (better = validation accuracy increased). Therefore pick the model of the epoch with the highest validation accuracy. Don't average the accuracies over different epochs, that wouldn't make much sense. You can use the Keras callback ModelCheckpoint to automatically save the model with the highest validation accuracy (see callbacks documentation).

The highest accuracy in model 1 is 0.7737 and the highest one in model 2 is 0.7572. Therefore you should view model 1 (at epoch 3) as better. Though it is possible that the 0.7737 was just a random outlier.

answered Jan 10 '16 at 12:05

aleju

2,376
1
17
10

3

Model Checkpoint "saves the model weights after each epoch if the validation loss decreased". Is this "equivalent" to higher validation accuracy. Looking at the numbers I see sometimes although validation loss decreases the validation accuracy is not higher. Why is this the case? – pr338 Jan 10 '16 at 17:43
4

Say you have 4 examples for which your model should predict the label 1. Now it does predict 4 times the value 0.51 (each above threshold 0.5, so the prediction is considered correct by the accuracy measurement). At the next epoch it changes the values to 0.49, 0.49, 0.49 and 0.95. The value of the loss function will improve significantly (because of the big change from 0.51 to 0.95), but the accuracy will get worse, because three of the values are now below the threshold of 0.5, so they are viewed as label 0. – aleju Jan 10 '16 at 18:15
@aleju just wondering, will increasing the number of epochs make the model better? I am still very confused about how the number of epochs affect the accuracy of the model. – user10024395 Mar 28 '16 at 13:31
@aleju in fact i follow your advice and pick model where val_accuracy is higher, however, they actually perform worse? what can be the reason? – user10024395 Mar 28 '16 at 14:18
@user136266 The validation accuracy is only an indication of how well your model is going to generalize to the "real world", i.e. to data which it hasn't seen during training. However, it is not a guarantee. Models with worse validation accuracy can generalize better, they are just less likely to do so. The validation accuracy becomes more meaningful if you use more validation examples, if these examples accurately reflect the "real world" (i.e. if the validation examples aren't skewed towards specific classes) and if your training data is not contained in your validation data. – aleju Mar 29 '16 at 11:00
And usually you should set the epochs to a high value and then just pick the model with the highest validation accuracy. It is expected that (usually) your validation accuracy keeps growing for some time and then starts to go down, while your training accuracy will keep improving (i.e. the model starts to memorize the training data). – aleju Mar 29 '16 at 11:02
@aleju how to pick model in keras? I mean for now, i can see that it prints out the details of each epoch but i have no idea how to pick model – user10024395 Mar 29 '16 at 18:00
1

Just use the `ModelCheckpoint` callback to automatically save the current model's weights to a file whenever the validation accuracy improves. Then during test or productive use, you rebuild and compile the same architecture (layers, activations, ...) and load the weights with `model.load_weights(filename)` (before calling `model.predict(...)` or something similar). – aleju Mar 29 '16 at 18:45
1

To save the best model, define `cb = [ModelCheckpoint("weights.h5", save_best_only=True, save_weights_only=True)]` and when training add the callback parameter: `model.fit(... callbacks=cb)` – Lavi Avigdor Mar 04 '17 at 18:22

score 6 · Answer 2 · answered Apr 01 '16 at 15:00

You need to key on decreasing val_loss or increasing val_acc, ultimately it doesn't matter much. The differences are well within random/rounding errors.

In practice, the training loss can drop significantly due to over-fitting, which is why you want to look at validation loss.

In your case, you can see that your training loss is not dropping - which means you are learning nothing after each epoch. It look like there's nothing to learn in this model, aside from some trivial linear-like fit or cutoff value.

Also, when learning nothing, or a trivial linear thing, you should a similar performance on training and validation (trivial learning is always generalizable). You should probably shuffle your data before using the validation_split feature.

How to tell which Keras model is better?

2 Answers2

Linked