When to stop CNN learning

Question

In tensorflow, I used to execute cnn learning for fixed number of epochs and save checkpoints in between after specified number of epochs interval. For evaluating the model, the checkpoints are restored and perform prediction on the validation dataset.

I want to automate the learning process, instead of using fixed epochs. Please explain how the loss value over mini batches can be utilised for determining the stopping point? Also please help me towards implementing learning rate decay in tensorflow. Which is better constant decay or exponential and how to determine the decay factor?

score 1 · Accepted Answer · edited May 23 '17 at 12:15

1

First for the number of iterations you can exit the training if your loss stopped improving on the batch i.e. if the difference between two loss values AVERAGED accross batches (to reduce batch fluctuations) is less than a determined threshold.

But you probably realized that the threshold is an hyperparameter too ! In fact there are quite a few attempts to completely automate ML but no matter what you do you still end up with some hyperparameters.

Secondly for the decay factor it is used when you feel the loss has stopped improving and think that you are in a local minima and oscillating in and out of the well without actually going in (this metaphore only works when you have 2 dimensions but I find it usefull still).

Almost every time it is done in the litterature it looks very hand-made: like you train for 200 epochs you see that it reached a plateau so you decrease your lr with a step function (argument staircase=True in TF) and then again.

What is commonly used is to divide the learning rate by 10 (exponential decay) but like before it is very arbitrary !

For details on how to implement learning rate decay in TF you can see dga's answer in this SO question. It is pretty straightforward !

What can help with the schedule and the values you use is cross-validation but oftentimes you can simply look at your loss and do it by hands.

There is no silver bullet in deep learning it is just trials and errors.

edited May 23 '17 at 12:15

Community

1
1

answered Sep 29 '16 at 15:37

jeandut

2,471
4
29
56

If you think it solved your problem you can accept it if it only helped you can upvote it. – jeandut Sep 30 '16 at 08:58
jean, can you suggest other stopping criteria useful in cnn learning besides loss function? I have seen something like the gap between training and validation accuracy. – Manjusha K Oct 01 '16 at 05:27
You also have I would say the most important stopping-criteria, which is as soon as your validation loss stopped decreasing or even started to rise it is a serious sign of overfitting and hence you should absolutely stop to train. – jeandut Oct 01 '16 at 10:07
Another one is you should have (after a few iterations) training loss inferior to validation loss. If you do not observe that it means there is a problem in how you defined the sets or trained your algorithm. – jeandut Oct 01 '16 at 10:10
In light of what I said if the gap between training and validation loss is still increasing and go past a threshold that means that your training is decreasing and that your validation loss stopped decreasing or went up. Or that your training loss stopped decreasing but the validation loss went up which in every case is not good as I told you so I guess you could make up such criteria. – jeandut Oct 01 '16 at 10:17
If you are working with CNN once you caught a glimpse of your training and validation curbs and if it looked ok, take a look at your filters it is often them, which will tell you if you did a good job or not. Once you got the regularization parameters and architecture alright your learning rae schedule should not matter a lot. – jeandut Oct 01 '16 at 10:22
Thank you so much for valuable suggestions :) I think now I can go for implementing the stopping criteria in my learning process – Manjusha K Oct 03 '16 at 04:52
Jean, during implementation after a fixed no of iterations I can go for the evaluation step on which I will calculate the training loss (over the whole training dataset ) and validation loss. Is this the right way or I need to calculate average training loss obtained on mini batches over the iterations? – Manjusha K Oct 03 '16 at 08:52
There is no right way in dl !^^ You can calculate the training loss by batch but you will get a very noisy signal so you will have to smooth it out and use a moving average. Or you can calculate the loss on the whole training set but it will be often too expensive/too big for memory for most applications. For more informations I advise you to read an open course on DL like Michael Nielsen's ebook or karpathy's CS 321. – jeandut Oct 03 '16 at 11:59

When to stop CNN learning

1 Answers1