How are Sigmoid/Tanh functions forgetting and including information in LSTMs

Question

I've read another post on here that discusses the intuition behind Tanh functions but it doesn't quite help me understand how the sigmoid and activation functions are forgetting and including information.

I guess I would like to understand what is happening to the data as it passes through these functions in an LSTM.

score 0 · Answer 1 · answered Jun 26 '17 at 16:00

It's easier to look at a schematic drawing of an LSTM cell:

So I guess you have already read in the other question: sigmoid/tanh functions have a fixed output range. For sigmoid, this is (0,1), while for tanh, it's (-1,1). Both have an upper and lower value.

As you see in the above picture, there are 3 gates - but contrary to what you might believe, these gates aren't actually connected in a feedforward manner to any other neuron in the cell.

The gates are connected to connections instead of neurons. Weird huh! Let me explain. x_t is projecting a connection to c_t. They are connected with a connection, that has a certain multiplier (aka weight). So the input from x_t to c_t becomes x_t * weight.

But that's not all. The gate adds another multiplier to that calculation. So instead of x_t * weight, it becomes x_t * weight * gate. Which for the input gate, is equivalent to x_t * weight * i_t.

Basically, the activation value of i_t gets multiplied with the value from x_t. So if i_t has a high value, then the value coming from x_t has a higher value to c_t. If i_t has a low value, then it could potentionally disable the input from x_t (if i_t=0).

That is really interesting, this drawing definitely is a really neat way to represent an LSTM. So, my understanding that the LSTM forgets information prior to adding new information to the Cell State is incorrect? — madsthaks, Jun 26 '17 at 23:52
Additionally, am I right to assume that the initial activation value is arbitrary and the value value coming from `x_t` will vary throughout training? I guess what i'm trying to understand is how are these gates deciding what is important (high value) or not important (low value)? — madsthaks, Jun 26 '17 at 23:58

How are Sigmoid/Tanh functions forgetting and including information in LSTMs

1 Answers1