How to choose number of hidden layers and nodes in neural network?

Question

What does number of hidden layers in a multilayer perceptron neural network do to the way neural network behaves? Same question for number of nodes in hidden layers?

Let's say I want to use a neural network for hand written character recognition. In this case I put pixel colour intensity values as input nodes, and character classes as output nodes.

How would I choose number of hidden layers and nodes to solve such problem?

Just to make sure where to start, you know what you need a hidden layer for? By the way, I do not think that you can get a perfect answer for this question — Tim, Feb 24 '12 at 19:16
From what I understand hidden layers generally allows more complex relationships.. I am aware that there might be no perfect answer, but what should I look for when deciding on number of layers/nodes? — gintas, Feb 24 '12 at 19:54
You should start by understanding why you even need hidden layers (XOR). — Tim, Feb 25 '12 at 01:35
[How many hidden layers should I use?](ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hl) [How many hidden units should I use?](ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hu) — Birol Kuyumcu, Feb 25 '12 at 12:50
Possible duplicates: [What is the criteria for choosing number of hidden layers and nodes in hidden layer?][1] [Estimating the number of neurons and number of layers of an artificial neural network][2] [1]: http://stackoverflow.com/questions/10565868/what-is-the-criteria-for-choosing-number-of-hidden-layers-and-nodes-in-hidden-la?lq=1 [2]: http://stackoverflow.com/questions/3345079/estimating-the-number-of-neurons-and-number-of-layers-of-an-artificial-neural-ne — eric-haibin-lin, May 17 '15 at 13:16
Possible duplicate of [multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer?](http://stackoverflow.com/questions/10565868/multi-layer-perceptron-mlp-architecture-criteria-for-choosing-number-of-hidde) — OmG, Feb 16 '17 at 14:17
Possible duplicate of [Role of Bias in Neural Networks](https://stackoverflow.com/questions/2480650/role-of-bias-in-neural-networks) — Joshua Nozzi, Jul 20 '18 at 17:07

Don Reba · Answer 1 · 2017-02-19T06:59:54.927

19

Note: this answer was correct at the time it was made, but has since become outdated.

It is rare to have more than two hidden layers in a neural network. The number of layers will usually not be a parameter of your network you will worry much about.

Although multi-layer neural networks with many layers can represent deep circuits, training deep networks has always been seen as somewhat of a challenge. Until very recently, empirical studies often found that deep networks generally performed no better, and often worse, than neural networks with one or two hidden layers.

Bengio, Y. & LeCun, Y., 2007. Scaling learning algorithms towards AI. Large-Scale Kernel Machines, (1), pp.1-41.

The cited paper is a good reference for learning about the effect of network depth, recent progress in teaching deep networks, and deep learning in general.

edited Feb 19 '17 at 06:59

answered Feb 24 '12 at 22:42

Don Reba

13,814
3
48
61

This is correct. Technically, due to `attentuation` problems, models such as the back propagation-trained multilayer perceptron have issues with too many layers. If you are going to have many hidden layers, you are will want to look into deep learning which can address this issue. – bean5 Oct 23 '13 at 07:39
1

This view is kind of old though. If you use pre training is has been proved that by increasing the number of layers you decrease the upper bound on the error. Also Hinton has some experiments which show that more hidden units means a better representation of the input and hence better results. This is especially present when using rectified linear units. – elaRosca Jan 12 '14 at 10:06
Maybe this was true in 2012, but I'm not sure it is now. I'd almost guess it's rare to not have more than two layers. – chris Feb 16 '17 at 19:33
@ChrisAnderson, very true! And we are using different kinds of networks, too. – Don Reba Feb 16 '17 at 19:42
2

I want to point out that the quote above was from other authors that Bengio and Lecun mentioned in their paper i.e., [Tesauro,1992]. – Long May 01 '19 at 09:01

score 8 · Answer 2 · answered Feb 24 '12 at 19:50

8

The general answer is to for picking hyperparameters is to cross-validate. Hold out some data, train the networks with different configurations, and use the one that performs best on the held out set.

answered Feb 24 '12 at 19:50

Rob Neuhaus

9,190
3
28
37

4

Ok, that is one solid approach. But is there a way to guesstimate it? Something like, this data could be pretty well explained with 10 principal components, so we should have around 10 hidden nodes arranged in 2 layers? – gintas Feb 24 '12 at 21:50

score 5 · Answer 3 · edited Mar 04 '23 at 11:55

Most of the problems I have seen were solved with 1-2 hidden layers. It is proven that MLPs with only one hidden layer are universal function approximators (Hornik et. al.). More hidden layers can make the problem easier or harder. You usually have to try different topologies. I heard that you cannot add an arbitrary number of hidden layers if you want to train your MLP with backprop because the gradient will become too small in the first layers (I have no reference for that). But there are some applications where people used up to nine layers. Maybe you are interested in a standard benchmark problem which is solved by different classifiers and MLP topologies.

score 3 · Answer 4 · answered Oct 18 '15 at 08:59

Besides the fact that cross-validation on different model configurations(no. of hidden layers OR neurons per layer) will lead you to choose better configuration.

One approach is training a model, as big and deep as possible and use dropout regularization to turn off some neurons and reduce overfitting.

the reference to this approach can be seen in this paper. https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

seralouk · Answer 5 · 2019-07-29T11:26:08.423

All the above answers are of course correct but just to add some more ideas: Some general rules are the following based on this paper: 'Approximating Number of Hidden layer neurons in Multiple Hidden Layer BPNN Architecture' by Saurabh Karsoliya

In general:

The number of hidden layer neurons are 2/3 (or 70% to 90%) of the size of the input layer. If this is insufficient then number of output layer neurons can be added later on.
The number of hidden layer neurons should be less than twice of the number of neurons in input layer.
The size of the hidden layer neurons is between the input layer size and the output layer size.

Keep always in mind that you need to explore and try a lot of different combinations. Also, using GridSearch you could find the "best model and parameters".

E.g. we can do a GridSearch in order to determine the "best" size of the hidden layer.

How to choose number of hidden layers and nodes in neural network?

5 Answers5

Linked