3

I've read another post on here that discusses the intuition behind Tanh functions but it doesn't quite help me understand how the sigmoid and activation functions are forgetting and including information.

I guess I would like to understand what is happening to the data as it passes through these functions in an LSTM.

madsthaks
  • 2,091
  • 6
  • 25
  • 46

1 Answers1

0

It's easier to look at a schematic drawing of an LSTM cell: enter image description here

So I guess you have already read in the other question: sigmoid/tanh functions have a fixed output range. For sigmoid, this is (0,1), while for tanh, it's (-1,1). Both have an upper and lower value.

As you see in the above picture, there are 3 gates - but contrary to what you might believe, these gates aren't actually connected in a feedforward manner to any other neuron in the cell.

The gates are connected to connections instead of neurons. Weird huh! Let me explain. x_t is projecting a connection to c_t. They are connected with a connection, that has a certain multiplier (aka weight). So the input from x_t to c_t becomes x_t * weight.

But that's not all. The gate adds another multiplier to that calculation. So instead of x_t * weight, it becomes x_t * weight * gate. Which for the input gate, is equivalent to x_t * weight * i_t.

Basically, the activation value of i_t gets multiplied with the value from x_t. So if i_t has a high value, then the value coming from x_t has a higher value to c_t. If i_t has a low value, then it could potentionally disable the input from x_t (if i_t=0).

Thomas Wagenaar
  • 6,489
  • 5
  • 30
  • 73
  • That is really interesting, this drawing definitely is a really neat way to represent an LSTM. So, my understanding that the LSTM forgets information prior to adding new information to the Cell State is incorrect? – madsthaks Jun 26 '17 at 23:52
  • Additionally, am I right to assume that the initial activation value is arbitrary and the value value coming from `x_t` will vary throughout training? I guess what i'm trying to understand is how are these gates deciding what is important (high value) or not important (low value)? – madsthaks Jun 26 '17 at 23:58