While studying about LSTM , I got to know about use of 2 different activation functions in input gate - sigmoid and tanh. I got the use of sigmoid but not tanh. In this stackoverflow article about use of tanh says that we want the second derivative of it to be sustain for long time before going to zero , I don't get why he is talking about second derivative. Also, he kind of saying that tanh eliminates vanishing gradient(in 2nd para) but in all articles that i read they say that Leaky ReLU helps in eliminating it. Therefore I want to understand about tanh in LSTM.This is not duplicate question ,I just want to understand the previously answered question. Thank You!
Asked
Active
Viewed 320 times
0

Gautam Goyal
- 230
- 1
- 4
- 16
-
I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). – desertnaut Jun 28 '21 at 13:26
-
It's not such a big issue , atleast don't close this question if you can't help it♂️. My account is also going to terminate , So please. – Gautam Goyal Jun 28 '21 at 19:10