Should I use loss or accuracy as the early stopping metric?

Question

I am learning and experimenting with neural networks and would like to have the opinion from someone more experienced on the following issue:

When I train an Autoencoder in Keras ('mean_squared_error' loss function and SGD optimizer), the validation loss is gradually going down. and the validation accuracy is going up. So far so good.

However, after a while, the loss keeps decreasing but the accuracy suddenly falls back to a much lower low level.

Is it 'normal' or expected behavior that the accuracy goes up very fast and stay high to fall suddenly back?
Should I stop training at the maximum accuracy even if the validation loss is still decreasing? In other words, use val_acc or val_loss as metric to monitor for early stopping?

See images:

Loss: (green = val, blue = train]

Accuracy: (green = val, blue = train]

UPDATE: The comments below pointed me in the right direction and I think I understand it better now. It would be nice if someone could confirm that following is correct:

the accuracy metric measures the % of y_pred==Y_true and thus only make sense for classification.
my data is a combination of real and binary features. The reason why the accuracy graph goes up very steep and then falls back, while the loss continues to decrease is because around epoch 5000, the network probably predicted +/- 50% of the binary features correctly. When training continues, around epoch 12000, the prediction of real and binary features together improved, hence the decreasing loss, but the prediction of the binary features alone, are a little less correct. Therefor the accuracy falls down, while the loss decreases.

This is an interesting plot. While i have no experience with autoencoders, i wonder if this is just some extreme case of overfitting. Did you try lowering your network complexity (smaller or more regulization) (maybe also check with an increased validation-subset?) I can imagine, that it will look different. — sascha, May 10 '16 at 15:26
@MarcinMożejko: I'm using mse, but it's autoencoder, not classification. — Mark, May 10 '16 at 15:40
Depends on what losses are calculated (don't know if they are always the same; MSE vs. accuracy sounds different). And there is also the difference in regulization (validation deactivates dropout and l1/l2 regs i think). I would just try these changes if it's not too heavy computationally. In general: you could use smaller datasets while debugging stuff like that. — sascha, May 10 '16 at 15:48
I also think, that this plot looks quite strange (up, stable, down; quite symmetrical; but i'm no expert). But the general observation of a decreasing training-loss (even monotone) and an increasing validation-loss is nothing special. Every NN which is too big will eventually do that (it memorized the samples). — sascha, May 10 '16 at 15:58
You are right about that @sascha, however here the training and validation loss move 'in sync'. The validation loss is lower than the training loss. To rephrase my question: Comparing epoch 10K with 15K, the loss is lower so the network is predicts better results at epoch 15K. But looking at the accuracy, the acc is much lower at epoch 15K, so it's predictions are much less accurate. Isn't that contradicting? — Mark, May 10 '16 at 16:12
I think you need to show some code / add more information for someone to help you. As i don't have experience with autoencoders, i don't think i can help you, but i can give one more idea: in keras for example, train-loss is calculated over batches, not the complete data; while validation is calculated on the full val-dataset. Maybe this behaviour *can be* possible if there is some autocorrelation and no/wrong shuffling during learning. It is very important to check all the internals of the code. — sascha, May 10 '16 at 16:20
Thanks for the tips @sascha. I know how the loss is calculated but for me to better undestand the difference, I will need to find out how the accuracy is calculated in Keras — Mark, May 10 '16 at 17:15
Does it help?: https://datascience.stackexchange.com/q/37186/80430 — hafiz031, Nov 03 '21 at 05:21

score 1 · Answer 1 · edited Mar 28 '21 at 12:49

1

If the prediction is real-time or the data is continuous rather than discrete, then use MSE(Mean Square Error) because the values are real time.

But in the case of Discrete values (i.e) classification or clustering use accuracy because the values given are either 0 or 1 only. So, here the concept of MSE will not applicable, rather use accuracy= no of error values/total values * 100.

edited Mar 28 '21 at 12:49

desertnaut

57,590
26
140
166

answered Oct 15 '17 at 15:12

Naren Babu R

453
2
9
33

Should I use loss or accuracy as the early stopping metric?

1 Answers1