12

I am confused on what is the correct way to use the initial state tensor in Tensorflow for RNNs. There is almost a 50/50 split between turtorials that either use LSTMStateTuple or cell.zero_state.

Are the two the same? If so, why are there two ways?

In one example they use tf.nn.rnn_cell.LSTMStateTuple to set the initial state and in the other they use cell.zero_state().

Why are there two methods? When to prefer the one or the other? Can you only use LSTMStateTuple when you set state_is_tuple? If so, does cell.zero_state() no longer work?

nbro
  • 15,395
  • 32
  • 113
  • 196
user3139545
  • 6,882
  • 13
  • 44
  • 87
  • The two are different things. `state_is_tuple` is used on LSTM cells, because LSTM cells' state is a tuple. `cell.zero_state` is the initializer of the state for all RNN cells. – Mihail Burduja Apr 04 '17 at 08:51
  • 1
    See this for an explanation on why LSTM state is a tuple: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ – Mihail Burduja Apr 04 '17 at 08:52
  • Yes but if you look at the two examples I provided they use two different approaches for setting up the initial state. – user3139545 Apr 04 '17 at 09:00
  • 1
    The first one is more explicit. `cell.zero_state` will initialize the correct state class depending on the RNN cell (state_is_tuple is true or false). This line `init_state = tf.nn.rnn_cell.LSTMStateTuple(cell_state, hidden_state)` can be interchanged with `cell.zero_state(batch_size)` where cell is defined before as `cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, state_is_tuple=True)` – Mihail Burduja Apr 04 '17 at 09:03

1 Answers1

11

The two are different things. state_is_tuple is used on LSTM cells because the state of LSTM cells is a tuple. cell.zero_state is the initializer of the state for all RNN cells.

You will generally prefer cell.zero_state function as it will initialize the required state class depending on whether state_is_tuple is true or not.

See this GitHub issue where you can see the cell.zero_state recommended - "use the zero_state function on the cell object".

Another reason why you may want cell.zero_state is because it is agnostic of the type of the cell (LSTM, GRU, RNN) and you can do something like this:

if type == 'GRU':
   cell = BasicGRUCell
else:
   cell = BasicLSTMCell(state_is_tuple=True)

init_state = cell.zero_state(batch_size)

with the initial state being set up OK.

LSTMStateTuple will work only on cells that have the state as a tuple.

When to use LSTMStateTuple?

You'll want to use LSTMStateTuple when you're initializing your state with custom values (passed by the trainer). cell.zero_state() will return the state with all the values equal to 0.0.

If you want to keep state between batches than you'll have to get it after each batch and add it to your feed_dict the next batch.

See this for an explanation on why LSTM state is a tuple.

nbro
  • 15,395
  • 32
  • 113
  • 196
Mihail Burduja
  • 3,196
  • 2
  • 22
  • 28
  • 2
    Keeping state between batches is not a reason to use LSTMStateTuple? I can do that with cell.zero_state() also...? – user3139545 Apr 04 '17 at 10:40
  • 1
    You're right. You can do that with `cell.zero_state()` too, that is why cell.zero_state is recommended in most cases. Still there is the option pass an LSTMStateTuple between batches. – Mihail Burduja Apr 04 '17 at 12:04