Understanding Seq2Seq model

Question

Here is my understanding of a basic Sequence to Sequence LSTMs. Suppose we are tackling a question-answer setting.

You have two set of LSTMs (green and blue below). Each set respectively sharing weights (i.e. each of the 4 green cells have the same weights and similarly with the blue cells). The first is a many to one LSTM, which summarises the question at the last hidden layer/ cell memory.

The second set (blue) is a Many to Many LSTM which has different weights to the first set of LSTMs. The input is simply the answer sentence while the output is the same sentence shifted by one.

The question is two fold: 1. Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory. 2. Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

(image taken from suriyadeepan.github.io)

I have lots of questions about your question.... 1 - what do you mean by "each set respectively sharing weights"? -- 2- What do you understand by cell memory and cell state? -- 3 -- Why is blue many to many if the picture says it gets the "thought vector" as input? -- 4 -- Why does the blue get the answer and outputs a shifted answer? Where does the question sentence come in? ---- One thing I can say is: only "outputs" are passed from one layer to another. — Daniel Möller, Sep 22 '17 at 05:10
1. answered in paranthesis above. 2. I think I meant cell state (basically one of the two things that gets passed out of the LSTM according to colah's blog). 3. I don't understand the thought vector (it's what the entire question is about), but it is many to many without that. Look at how output loops back into the input. 4. This is training time only, during testing you just take the highest probability output (or beam search) — sachinruk, Sep 22 '17 at 05:18

Yu-Yang · Accepted Answer · 2022-05-19T09:31:12.470

Are we passing the last hidden state only to the blue LSTMs as the initial hidden state. Or is it last hidden state and cell memory.

Both hidden state h and cell memory c are passed to the decoder.

TensorFlow

In seq2seq source code, you can find the following code in basic_rnn_seq2seq():

_, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)
return rnn_decoder(decoder_inputs, enc_state, cell)

If you use an LSTMCell, the returned enc_state from the encoder will be a tuple (c, h). As you can see, the tuple is passed directly to the decoder.

Keras

In Keras, the "state" defined for an LSTMCell is also a tuple (h, c) (note that the order is different from TF). In LSTMCell.call(), you can find:

    h_tm1 = states[0]
    c_tm1 = states[1]

To get the states returned from an LSTM layer, you can specify return_state=True. The returned value is a tuple (o, h, c). The tensor o is the output of this layer, which will be equal to h unless you specify return_sequences=True.

Is there a way to set the initial hiddden state and cell memory in Keras or Tensorflow? If so reference?

###TensorFlow### Just provide the initial state to an LSTMCell when calling it. For example, in the official RNN tutorial:

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
...
    output, state = lstm(current_batch_of_words, state)

There's also an initial_state argument for functions such as tf.nn.static_rnn. If you use the seq2seq module, provide the states to rnn_decoder as have been shown in the code for question 1.

###Keras###

Use the keyword argument initial_state in the LSTM function call.

out = LSTM(32)(input_tensor, initial_state=(h, c))

You can actually find this usage on the official documentation:

###Note on specifying the initial state of RNNs###

You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.

EDIT:

There's now an example script in Keras (lstm_seq2seq.py) showing how to implement basic seq2seq in Keras. How to make prediction after training a seq2seq model is also covered in this script.

Could you put a link to the part where you found the information on seq2seq `enc_state` on tensorflow. Just have a feeling you were looking at the source by the looks of it. And thanks heaps, great write up! — sachinruk, Sep 24 '17 at 01:58
Yes I looked at the source code for it. I've added a link to the code. Those lines come from the `basic_rnn_seq2seq()` function (which is the function used in the official tutorial). If you execute the `rnn.static_rnn()` line, the returned `enc_state` will be a `LSTMStateTuple(c=..., h=...)`. — Yu-Yang, Sep 24 '17 at 04:30
[link](https://github.com/keras-team/keras-io/blob/master/examples/nlp/lstm_seq2seq.py) seems to be changed — JeeyCi, May 18 '22 at 07:43

Daniel Möller · Answer 2 · 2017-09-24T06:51:21.227

(Edit: this answer is incomplete and hasn't considered actual possibilities of state transfering. See the accepted answer).

From a Keras point of view, that picture has only two layers.

The green group is one LSTM layer.
The blue group is another LSTM layer.

There isn't any communication between green and blue other than passing the outputs. So, the answer for 1 is:

Only the thought vector (which is the actual output of the layer) is passed to the other layer.

Memory and state (not sure if these are two different entities) are totally contained inside a single layer and are not initially intended to be seen or shared with any other layer.

Each individual block in that image is totally invisible in keras. They are considered "time steps", something that only appears in the shape of the input data. It's rarely important to worry about them (unless for very advanced usages).

In keras, it's like this:

Easily, you have access only to the external arrows (including "thought vector").
But having access to each step (each individual green block in your picture) is not an exposed thing. So...

Passing the states from one layer to the other is also not expected in Keras. You will probably have to hack things. (See this: https://github.com/fchollet/keras/issues/2995)

But considering a thought vector big enough, you could say it will learn a way to carry what is important in itself.

The only notion you have from the steps is:

You have to input things shaped like (sentences, length, wordIdFeatures)

The steps will be performed considering that each slice in the length dimension is an input to each green block.

You may choose to have a single output (sentences, cells), for which you completely lose track of steps. Or...

Outputs like (sentences, length, cells), from which you know the output of each block through the length dimension.

One to many or many to many?

Now, the first layer is many to one (but nothing prevents it from being many to many too if you want).

But the second... that's complicated.

If the thought vector was made by a many to one. You will have to manage a way of creating a one to many. (That's not trivial in keras, but you could think of repeating the thought vector for the expected length, making it be the input to all steps. Or maybe fill an entire sequence with zeros or ones, keeping only the first element as the thought vector)
If the thought vector was made by a many to many, you can take advantage of this and keep an easy many to many, if you're willing to accept that the output has exactly the same number of steps as the input.

Keras doesn't have a ready solution for 1 to many cases. (From a single input predict a whole sequence).

`but nothing prevents it from being many to many too if you want`. This is wrong actually. You are probably thinking of trying to predict the next word which isn't what Im trying to do here. I want to summarise the question in the last cell state/ hidden layer and then pass this on to the answer LSTMs. This is strictly many to many. I think you ought to take a look at the tensorflow seq2seq documentation before you answer this. — sachinruk, Sep 22 '17 at 07:45
Wow, you could at least not be rude with Daniel who took some time to write you a very well answer. And as far as I know, you can do a `many to many` or a `one to many`. It depends if you want the result of each timestep in the green LSTM layer to be fed to the blue LSTM layer — BenDes, Sep 22 '17 at 13:54
I am really sorry for coming off so blunt, didn't mean to sound rude. It wasn't me who down voted the original +1. I really do appreciate the time and effort you put into making this answer. Again, sorry for being a dick. — sachinruk, Sep 24 '17 at 04:46
Hmmm, it seems my answer wasn't really worthy after all. Looking at the accepted answer, mine is simply wrong. — Daniel Möller, Sep 24 '17 at 06:49

Understanding Seq2Seq model

2 Answers2

TensorFlow

Keras

Linked