1

enter image description here

In a LSTM cell, there are 5 equations for 3 gate and 2 cell states.

Forget gate, Input gate, Output gate (I'm not sure it is correct name called) use sigmoid for activating between [0, 1].

In contrast, Ct' and Ht use tanh for activating betweenn [-1, 1].

I could not find why there are different activation function used.

Roy Lee
  • 329
  • 1
  • 2
  • 12
  • Sigmoid output is not [-1,1] it is [0,1] – Bhanu Tez Feb 19 '19 at 05:12
  • @BhanuTez, Sorry for confuse, I edited it – Roy Lee Feb 19 '19 at 06:22
  • 1
    Check out this thread, maybe it'll be helpful: https://stackoverflow.com/questions/40761185/what-is-the-intuition-of-using-tanh-in-lstm – amityadav Feb 19 '19 at 07:12
  • @amityadav, Thanks for your kind, it is helpful for me.But I'm still confused why there is tanh function.. – Roy Lee Feb 19 '19 at 08:02
  • @RoyLee From the link above, the best reason seems to be "to overcome the vanishing gradient problem, we need a function whose second derivative can sustain for a long range before going to zero". This makes a lot of sense, considering that the simple RNNs suffer from vanishing gradients a lot. GRU and LSTM cells solve this problem by overcoming vanishing gradients and both use the tanh since its, as mentioned in the comment, higher order gradients can sustain for a long range before going towards zero. But perhaps someone else has a clearer understanding of this. – amityadav Feb 19 '19 at 08:11

0 Answers0