I'm trying to use CNN to classify images and as far as I can see, ReLu is a popular choice for activation unit in each convolutional layer. Based on my understanding, ReLU would keep all positive image intensities and convert the negative ones to 0s. For me, it's like processing step, not really "firing" step at all .So what is the purpose of using ReLU here?

- 816
- 1
- 11
- 26
-
1Possible duplicate of [this question](https://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net). For more information specific to ReLU see [this question](https://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks). – jodag Nov 18 '17 at 20:12
-
See also https://stats.stackexchange.com/questions/141960/deep-neural-nets-relus-removing-non-linearity – MSalters Nov 22 '17 at 14:41
1 Answers
First of all, it introduces non-linearity. Without it, the whole CNN would be nothing more than a succession of matrix multiplications and max poolings (so you wouldn't be able to approximate and learn complicated functions). But I imagine you are asking why ReLU in particular is popular. One reason that comes to mind is that other activation functions like tanh
or sigmoids have a gradient saturation problem. It means that once the value they output is near the maximum value, their gradient becomes insignificant (just look at their graph, e.g. on wikipedia) and they would kill the gradient upon backpropagation. ReLU doesn't have this problem. Additionally, the fact that ReLUs produce a zeros for negative values means that the intermediate representations that the network produces tend to be sparser.

- 4,611
- 6
- 27
- 41