18

I am studying Convolutional Neural Networks. I am confused about some layers in CNN.

Regarding ReLu... I just know that it is the sum of an infinite logistic function, but ReLu doesn't connect to any upper layers. Why do we need ReLu, and how does it work?

Regarding Dropout... How does dropout work? I listened to a video talk from G. Hinton. He said there is a strategy which just ignores half of the nodes, randomly, when training the weights, and halves the weight when predicting. He says it was inspired from random forests and works exactly the same as computing the geometric mean of these randomly trained models.

Is this strategy the same as dropout?

Can someone help me to solve this?

stop-cran
  • 4,229
  • 2
  • 30
  • 47
user3783676
  • 429
  • 2
  • 6
  • 16
  • 1
    A very good resource is the [CVPR 2014 Tutorial on Large-Scale Visual Recognition](https://sites.google.com/site/lsvrtutorialcvpr14/home/deeplearning) by [Marc'Aurelio Ranzato](http://www.cs.toronto.edu/~ranzato/). It introduces and details both topics. – deltheil Dec 05 '14 at 18:45
  • @deltheil I'm sorry, but I cannot find anything about dropout in the paper you linked. Searching the document for "dropout" returns three occurrences, all three just a mention that dropout is used here. Do you have a page nr where it details dropout? I already read it through but haven't found something about dropout – DBX12 Jun 24 '17 at 14:32

1 Answers1

25

ReLu: The rectifier function is an activation function f(x) = Max(0, x) which can be used by neurons just like any other activation function, a node using the rectifier activation function is called a ReLu node. The main reason that it is used is because of how efficiently it can be computed compared to more conventional activation functions like the sigmoid and hyperbolic tangent, without making a significant difference to generalisation accuracy. The rectifier activation function is used instead of a linear activation function to add non linearity to the network, otherwise the network would only ever be able to compute a linear function.

Dropout: Yes, the technique described is the same as dropout. The reason that randomly ignoring nodes is useful is because it prevents inter-dependencies from emerging between nodes (I.e. nodes do not learn functions which rely on input values from another node), this allows the network to learn more a more robust relationship. Implementing dropout has much the same affect as taking the average from a committee of networks, however the cost is significantly less in both time and storage required.

Hungry
  • 1,645
  • 1
  • 16
  • 26
  • 1
    Doese ReLu connect to an upper layer? I check the architecture of AlexNet for imagenet task. It seems that ReLu is an independent layer. If so, it doesn't pass value to upper layers. Why we need this "irrelevant" layer? – user3783676 Dec 11 '14 at 18:27
  • 4
    A ReLu is just a single neuron which implements the rectifier activation function *max(0, n)*, not an entirely new layer. Although the report doesn't state exact details, it looks as though this activation function is used on each neuron in the network, in both the convolutional, and fully connected layers. – Hungry Dec 12 '14 at 08:57
  • ReLu function still looks kind of linear. Is it able to solve problems as well as sigmoid? – Andrzej Gis Sep 18 '17 at 08:43