I’ve been trying to learn how neural networks work but I can’t get my head around hidden layers. If the hidden neurones all have the same inputs and they all have random weights (at the start), why don’t the weights (through training) become similar across the neurones? What causes the neurones to do separate tasks, e.g. look for different patterns in a number?
1 Answers
Consider how the learning process works: you are optimising a loss function L(w)
that depends on the weights of the network w
. Note that the learning process generally happens on the weights, and not the neurons.
The usual way to learn is through gradient descent, which means you will iteratively decrease L(w)
by making small changes to the weights w
, using local information about how L(w)
behaves when we slightly tweak w
(this is exactly what the gradient of w
measures). Following the gradient of w
will locally give us the best way to tweak w
so that L(w)
decreases, and when the different components of w
are different, there is no reason why they should evolve in the same direction, even when they are connected to the same neurons.
Note that this is only true because we initialise the weights randomly. If we set them all to the same initial value, say 1e-3
, and we are using a symmetric architecture like it is the case for fully-connected layers, then the gradient will be symmetric as well and learning will stall. This is because the gradient would hold the same value for each of the weight. For a more intuitive reason why that is the case, check this answer. You can also lookup "symmetry breaking in machine learning" for more on this topic.

- 4,407
- 1
- 28
- 48