- Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory.
Both hidden state h
and cell memory c
are passed to the decoder.
TensorFlow
In seq2seq source code, you can find the following code in basic_rnn_seq2seq()
:
_, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
return rnn_decoder(decoder_inputs, enc_state, cell)
If you use an LSTMCell
, the returned enc_state
from the encoder will be a tuple (c, h)
. As you can see, the tuple is passed directly to the decoder.
Keras
In Keras, the "state" defined for an LSTMCell
is also a tuple (h, c)
(note that the order is different from TF). In LSTMCell.call()
, you can find:
h_tm1 = states[0]
c_tm1 = states[1]
To get the states returned from an LSTM
layer, you can specify return_state=True
. The returned value is a tuple (o, h, c)
. The tensor o
is the output of this layer, which will be equal to h
unless you specify return_sequences=True
.
- Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?
###TensorFlow###
Just provide the initial state to an LSTMCell
when calling it. For example, in the official RNN tutorial:
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
...
output, state = lstm(current_batch_of_words, state)
There's also an initial_state
argument for functions such as tf.nn.static_rnn
. If you use the seq2seq module, provide the states to rnn_decoder
as have been shown in the code for question 1.
###Keras###
Use the keyword argument initial_state
in the LSTM function call.
out = LSTM(32)(input_tensor, initial_state=(h, c))
You can actually find this usage on the official documentation:
###Note on specifying the initial state of RNNs###
You can specify the initial state of RNN layers symbolically by
calling them with the keyword argument initial_state
. The value of
initial_state
should be a tensor or list of tensors representing the
initial state of the RNN layer.
EDIT:
There's now an example script in Keras (lstm_seq2seq.py) showing how to implement basic seq2seq in Keras. How to make prediction after training a seq2seq model is also covered in this script.