What are the disadvantages of Leaky-ReLU?

Question

We use ReLu instead of Sigmoid activation function since it is devoid of vanishing and exploding gradients problem that has been in sigmoid like activation functions,
Leaky-ReLU is one of rely's improvements. Everyone is talking about the advantages of Leaky-ReLU. But what are the disadvantages of Leaky-ReLU?

Ignacio Peletier · Answer 1 · 2019-06-20T07:24:26.670

ReLU replaced sigmoid in the hidden layers since it yields better results for general purpose applications, but it really depends in your case and other activation function might work better. Leaky ReLU helps with the vainishing gradient problem.

I think the main disadvange of Leaky ReLU is that you have another parameter to tune, the slope. But I remark that it really depends in your problem which function works better.

Joaquin Torrens · Answer 2 · 2020-10-03T01:28:38.050

The adventage:
LeakyRelu is "inmortal".
If you play enough with your Relu neural network some neurons are going to die. (specialy with L1, L2 normalization) Detect death neurons is hard. Correct them even harder.
The disadventage:
You will add computational work on every epoch. (it's harder to multiply than to assign a zero)
Depending the job you may need a few more epochs to convergence.
The slope at negative z is another parameter but not a very critical one.
When you reach small learning rates a dead neuron tend to remain dead.

What are the disadvantages of Leaky-ReLU?

2 Answers2