I understand the concept of having muliple layers, backpropagation, etc. I even understand that an activation function would squash the output to a certain range based on the activation function used. But why do we even require this? What happens if we continue with the actual result without an activation function?
Please help me understand, but in pure english - no graphs/formulas please - i want to understand the concept behind it