Why are there two different type of activation function used in LSTM cell?

Asked Feb 19 '19 at 05:08

Active Feb 19 '19 at 06:22

Viewed 1,245 times

In a LSTM cell, there are 5 equations for 3 gate and 2 cell states.

Forget gate, Input gate, Output gate (I'm not sure it is correct name called) use sigmoid for activating between [0, 1].

In contrast, Ct' and Ht use tanh for activating betweenn [-1, 1].

I could not find why there are different activation function used.

edited Feb 19 '19 at 06:22

asked Feb 19 '19 at 05:08

Roy Lee

Sigmoid output is not [-1,1] it is [0,1] – Bhanu Tez Feb 19 '19 at 05:12
@BhanuTez, Sorry for confuse, I edited it – Roy Lee Feb 19 '19 at 06:22
1

Check out this thread, maybe it'll be helpful: https://stackoverflow.com/questions/40761185/what-is-the-intuition-of-using-tanh-in-lstm – amityadav Feb 19 '19 at 07:12
@amityadav, Thanks for your kind, it is helpful for me.But I'm still confused why there is tanh function.. – Roy Lee Feb 19 '19 at 08:02
@RoyLee From the link above, the best reason seems to be "to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero". This makes a lot of sense, considering that the simple RNNs suffer from vanishing gradients a lot. GRU and LSTM cells solve this problem by overcoming vanishing gradients and both use the tanh since its, as mentioned in the comment, higher order gradients can sustain for a long range before going towards zero. But perhaps someone else has a clearer understanding of this. – amityadav Feb 19 '19 at 08:11

0 Answers0