0

I understand the concept of having muliple layers, backpropagation, etc. I even understand that an activation function would squash the output to a certain range based on the activation function used. But why do we even require this? What happens if we continue with the actual result without an activation function?

Please help me understand, but in pure english - no graphs/formulas please - i want to understand the concept behind it

Ravi
  • 561
  • 2
  • 7
  • 19
  • 1
    Possible duplicate of [Why must a nonlinear activation function be used in a backpropagation neural network?](https://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net) – Dr. Snoopy Jan 26 '18 at 08:12

2 Answers2

0

There are few reasons to use activation function, the most common one is when the output needs to be within certain range by its nature. e.g. if the output is a probability, which is only valid in range [0, 1].

Fermat's Little Student
  • 5,549
  • 7
  • 49
  • 70
0

If your activation function is just a(z)=z (a linear neuron), the activation is just the weighted input (plus bias). In this case, the activation of each layer is a linear function of the previous layer's activation. You can quite easily convince yourself that the combined effect of many layers (i.e. a deep network) is still a linear function. That means that you could get exactly the same result with just an input layer and an output layer, without any hidden neurons. In other words, you would not win any additional complexity in what your network can do by adding hidden layers, so no advantage going to "deep" neural networks.

rain city
  • 227
  • 1
  • 7
  • THankyou! Makes sense. So can i come up with my own activation function instead of going with the standard (one of the types of) reLu, or tanh, etc? Given that my activation function is zero mean, easy on computation? My question is just to clarify more on the same question – Ravi Jan 28 '18 at 00:20
  • If you have a good idea, you could try something else. Doesn't even necessarily have to be antisymmetric around zero (I assume that's what you mean with zero mean) - rectified linear units aren't, either. – rain city Jan 28 '18 at 01:07