Referring to this answer on choosing number of hidden layers and units in a NN:
https://stackoverflow.com/a/10568938/2265724
The post suggests adding the number of hidden units until the generalization error starts to increase.
But my problem is the learning rate. Given a value for the number the hidden unit (i.e. one data point in the graph or one particular architecture e.g. let's say 10 hidden units) how do I set the learning rate and how many epochs to train?
1. use a fixed learning rate (after checking it converges i.e the cost drops) and run for n epochs or until the cost (or validation error) plateaus (if it does drop in a nice asymptotic fashion)
2. as in 1 with early stopping
3. as in 1 or 2, but trying various different learning rates in a certain (linear or log) range
4. as in 3, including learning rate decay
5. as in 3 or 4, including weight decay as regularization, or perhaps better, dropout
The number of parameters increase from 1 to 5. 1 is quickest but doesn't sound satisfactory (why not try other learning rates?). 3-5 are time consuming. Because if I am not happy, I need to try another architecture by increasing the number of hidden units. And repeat until the graph shown in the post is obtained.
Am I understanding and practicing this correctly ?