0

I'm running a MLP to classify a set of values into 10 different classes.

Simplified down, I have a sonar which gives me 400 "readings" of an object. Each reading is a list of 1000 float values.

I have scanned 100 total objects and want to classify them and evaluate the model based on a leave-one-out cross validation.

For each object, I split the data into a training set of 99 objects and a test set of the remaining object. I feed the training set (99 objects, 99*400 "readings") into the MLP and use the test set (1 objects, 1*400 "readings) to validate.

My question is: How do I know which training epoch to use as the final "best" model? I googled around and some people said to use the epoch which had the best validation accuracy, but this seems like cheating to me. Shouldn't I instead pick a model based only on the statistics of the training data? (My thought process is that a random weight reshuffling in training could create an artificially high validation accuracy that doesn't actually provide a useful model for new objects that could be scanned in the future)

SO Answer that says to use the training epoch which gives the best validation accuracy:

whats is the difference between train, validation and test set, in neural networks?

Best, Deckwasher

Community
  • 1
  • 1
jonas smith
  • 555
  • 1
  • 4
  • 7
  • This looks like a math problem, not a programming problem. A very valid, well defined and clearly stated problem, but unfortunately on the wrong site. – Mad Physicist Jun 01 '16 at 20:22

3 Answers3

1

This is called early stopping.

What you need is a validation set.

-After each epoch, compute your desired evaluation measure over the validation set.

-Always save the parameters of the best performing model over validation set in a variable.

-If for two or n iterations the validation results are not improved stop the epochs and reset the MLP with the best performing parameters.

-Then compute the results over the test set with the best performing model over validation set that you saved before.

Ash
  • 3,428
  • 1
  • 34
  • 44
0

You want to optimize your generalization/true-error (how good are my predictions on unseen data), which typically consists of (see here for paper including this concept; although in another context -> SGD + SVM):

  • approximation-error: how good could the data described by your model
  • estimation-error: effect of minimizing empirical risk instead of the expected risk
  • optimization-error: measures the impact of the approximate optimization on the expected risk
    • The optimization error can be reduced by running the optimizer longer (which is your variable here)

You see, that the true-error is only partially described by your optimization-error (decision when to stop), but a good cross-validation scheme can be much more precise regarding the description/evaluation of the true-error (it's basically why CV is done; at some costs). Therefore CV-based chosing of the epoch to use is so common.

Of course it's also very important to make the cross-validation scheme somewhat sane. k-fold-based schemes with not too small k are often used (at least in non-NN applications; might be too costly for NNs).

sascha
  • 32,238
  • 6
  • 68
  • 110
0

One way to decide when to stop is to evaluate the accuracy for the test set (or validation set) and print it after each epoch. Once the maximum epoch is reached you can stop it.

Another way is to pickle(in python) or serialize (in Java) and store in a file or disk the set of weights and biases if the accuracy of the current weights and biases is better than the current max.

Nagabhushan Baddi
  • 1,164
  • 10
  • 18