0

I implemented a Sequence to Sequence Encoder Decoder but I am having problems with varying my target length in the prediction. It is working for the same length of the training sequence but not if it is different. What do I need to change ?

from keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np

num_encoder_tokens = 2
num_decoder_tokens = 2
encoder_seq_length = None
decoder_seq_length = None
batch_size = 100
epochs = 2000
hidden_units=10
timesteps=10

input_seqs = np.random.random((1000, 10, num_encoder_tokens))
target_seqs = np.random.random((1000, 10, num_decoder_tokens))



#define training encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(hidden_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
#define training decoder
decoder_inputs = Input(shape=(None,num_decoder_tokens))
decoder_lstm = LSTM(hidden_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_encoder_tokens, activation='tanh')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

#Run training
model.compile(optimizer='adam', loss='mse')
model.fit([input_seqs, target_seqs], target_seqs,batch_size=batch_size, epochs=epochs)

#new target data
target_seqs = np.random.random((2000, 10, num_decoder_tokens))


# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(hidden_units,))
decoder_state_input_c = Input(shape=(hidden_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

# Initalizse states 
states_values = encoder_model.predict(input_seqs)

and here it wants the same batchsize as in the input_seqs and does not accept target_seqs having a batch of 2000

target_seq = np.zeros((1, 1, num_decoder_tokens))
output=list()
for t in range(timesteps):
    output_tokens, h, c  = decoder_model.predict([target_seqs] + states_values)
    output.append(output_tokens[0,0,:])
    states_values = [h,c]
    target_seq = output_tokens

What do I need to change that the model accepts a variable length of input ?

2 Answers2

1

You can create in your data a word/token that means end_of_sequence.

You keep the length to a maximum and probably use some Masking(mask_value) layer to avoid processing undesired steps.

In both the inputs and outputs, you add the end_of_sequence token and complete the missing steps with mask_value.

Example:

  • the longest sequence has 4 steps
    • make it 5 to add an end_of_sequence token:
      • [step1, step2, step3, step4, end_of_sequence]
  • consider a sequence that is shorter:
    • [step1, step2, end_of_sequence, mask_value, mask_value]

Then your shape will be (batch, 5, features).


Another approach is described in your other question, where the user loops each step manually and checks whether the result of this step is the end_of_sequence token: Difference between two Sequence to Sequence Models keras (with and without RepeatVector)

If this is an autoencoder, there is also another possibility for variable lengths, where you take the length directly from the input (must feed batches with only one sequence each, no padding/masking): How to apply LSTM-autoencoder to variant-length time-series data?

This is another approach where we store the input length explicitly in a reserved element of the latent vector and later we read this (must also run with only one sequence per batch, no padding): Variable length output in keras

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • @one last question I commented below also but does the for loop above needs to be in range of timesteps or is also 1 possible ? – texaspythonic Jul 25 '18 at 21:22
  • You can make it infinite if you want (but your model must be good enough to always reach an `end_of_sentence`). So be careful and define a safety maximum. It doesn't seem to make sense having it as 1. – Daniel Möller Jul 26 '18 at 02:41
0

Unfortunately you cannot do that. You have to set your input to the maximum expected length. Then you can use a Masking layer with either an Embedding layer or using a masking value as

keras.layers.Masking(mask_value=0.0)

See more information here.

papayiannis
  • 114
  • 5
  • https://arxiv.org/pdf/1406.1078.pdf because this paper is talking about a variable length of input and target. What would be the reference paper for this model above? – texaspythonic Jul 24 '18 at 15:46
  • The model you create using masking would still be of variable length effectively. The masking approach is an implementational choice for variable length inputs-outputs. It is what Keras uses to allow that. A discussion on this for Tensorflow is given [here](https://danijar.com/variable-sequence-lengths-in-tensorflow/). – papayiannis Jul 24 '18 at 16:17
  • @ppayiannis thanks for the infomration one last question: does the for loop needs to be in range of the timesteps ? – texaspythonic Jul 25 '18 at 21:21