3

I try to understand what the difference between this model describde here, the following one:

from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

and the sequence to sequence model described here is second describion

What is the difference ? The first one has the RepeatVector while the second does not have that? Is the first model not taking the decoders hidden state as inital state for the prediction ?

Are there a paper describing the first and second one ?

1 Answers1

4

In the model using RepeatVector, they're not using any kind of fancy prediction, nor dealing with states. They're letting the model do everything internally and the RepeatVector is used to transform a (batch, latent_dim) vector (which is not a sequence) into a (batch, timesteps, latent_dim) (which is now a proper sequence).

Now, in the other model, without RepeatVector, the secret lies in this additional function:

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

This runs a "loop" based on a stop_condition for creating the time steps one by one. (The advantage of this is making sentences without a fixed length).

It also explicitly takes the states generated in each step (in order to keep the proper connection between each individual step).


In short:

  • Model 1: creates the length by repeating the latent vector
  • Model 2: creates the length by looping new steps until a stop condition is reached
Community
  • 1
  • 1
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • you said you can make sentences without a fixed length, would this also possible with timeseries data ? If yes would you be able to provide an example based on a shape of (None, timesteps, number of features) ? – texaspythonic Jul 24 '18 at 13:18
  • There is no difference between sentences and timeseries, both are data shaped like `(batch, steps, features)` that are blindly interpreted by the model. --- You can follow exactly the same procedure they used in this model, or you can maybe follow the stateful approach I created in these answers: https://stackoverflow.com/questions/47594861/predicting-a-multiple-time-step-forward-of-a-time-series-using-lstm/47719094#47719094 --- https://stackoverflow.com/questions/48760472/how-to-use-the-keras-model-to-forecast-for-future-dates-or-events/48807811#48807811 – Daniel Möller Jul 24 '18 at 13:30
  • I tried something out but my prediction wants the same batch size as in the training and is not variable. I will make another post with my code for that. – texaspythonic Jul 24 '18 at 14:48
  • have look here please https://stackoverflow.com/questions/51501726/variable-input-for-sequence-to-sequence-autoencoder – texaspythonic Jul 24 '18 at 14:54
  • another question is if the model with the RepeatVector is also using the encoders initial state as decoders inital state in the prediction ? – texaspythonic Jul 24 '18 at 15:48
  • No, that model uses only the "latend data", nothing else. (It creates its own states as it progresses through the steps, but internally and not seen by the user). – Daniel Möller Jul 24 '18 at 16:34