1

I am working on seq2seq chatbot. I would ask you, how to ignore PAD symbols in chatbots responses while val_acc is counting.

For example, my model generates response: [I, am, reading, a, book, PAD, PAD, PAD, PAD, PAD]

But, right response should be: [My, brother, is, playing, fotball,PAD, PAD, PAD, PAD, PAD].

In this case, chatbot responded totally wrong, but val_acc is 50% because of padding symbols.

I use Keras, encoder-decoder model (https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) with teacher forcing

My code is here:

encoder_inputs = Input(shape=(sentenceLength,), name="Encoder_input")
encoder = LSTM(n_units, return_state=True, name='Encoder_lstm')
Shared_Embedding = Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding", mask_zero='True') 
word_embedding_context = Shared_Embedding(encoder_inputs)
encoder_outputs, state_h, state_c = encoder(word_embedding_context)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name="Decoder_input")
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")

word_embedding_answer = Shared_Embedding(decoder_inputs)
decoder_outputs, _, _ = decoder_lstm(word_embedding_answer, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax', name="Dense_layer")
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Encoder input is sentence where each word is integer and 0 is padding: [1,2,5,4,3,0,0,0] -> User question Decoder input is also sentence where each word is integer, 0 is padding and 100 is symbol GO: [100,8,4,2,0,0,0,0,0]] ->chatbot response shifted one timestamp decoder output is sentence, where words are integers, and these integers are one hot encoded: [8,4,2,0,0,0,0,0, 0]] ->chatbot response (integers are one hot encoded.)

Problem is, that val_acc is too hight, also whan model predicts totaly wrong sentences. I think that it is caused because of paddings. Is there something wrong with my model? Should I add some another mask to my decoder?

Here is my graphs: enter image description here enter image description here

today
  • 32,602
  • 8
  • 95
  • 115

1 Answers1

1

You are correct, it is because that tutorial doesn't use Masking (documentation) to ignore those padding values and shows examples of equal input output length. In your case, the model will still input output PAD but the mask will ignore them. For example, to mask the encoder:

# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_inputs = Masking()(encoder_inputs) # Assuming PAD is zeros
encoder = LSTM(latent_dim, return_state=True)
# Now the LSTM will ignore the PADs when encoding
# by skipping those timesteps that are masked
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
nuric
  • 11,027
  • 3
  • 27
  • 42
  • I use mask_zero argument in Embedding layer. `Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding", mask_zero='True') ` It is also necessary to use your code: `encoder_inputs = Masking()(encoder_inputs)` ? Whan I use embedidng with mask_zero argument, nothing happens. Should I use some kind of masking in my decoder model? I think that val_acc is counted from generated sentence which still contains padding symbols. Am i wrong? – Lukáš Richtarik May 12 '18 at 13:13
  • 2
    No, `mask_zero` is correct and has the same effect; it attaches a mask based on input. LSTM skips those timesteps and [loss calculation](https://github.com/keras-team/keras/blob/cbadaf00e28f7fe42762b55f52294e3a7bb90515/keras/engine/training_utils.py#L408) zeros out masked outputs. It's difficult say what your model is doing without the code, make sure you are masking input and outputs correctly. – nuric May 12 '18 at 14:31
  • I have edited my question and attach code of my model. Can you see some error in my code? Should I add some another mask to my decoder? – Lukáš Richtarik May 12 '18 at 16:07
  • Seems fine, you don't need to return_state for the decoder. Have you checked that the loss for your example is correct when val_acc is 50%? – nuric May 12 '18 at 16:19
  • I attach to my question acc and loss graphs. I have trained my model with 8000 sentences and test with 1000 sentences. It seems that model cannot learn anything. Whan i try to predict some sentences from set of testing data, model predict random words and some PAD symbols. I think that val_acc should be much smaller. – Lukáš Richtarik May 12 '18 at 18:26
  • That looks like classic overfitting though. To confirm your theory about val_acc you can train with `batch_size=1` and remove any padding, just give single data points at a time. – nuric May 12 '18 at 18:30
  • I will try it. Anyway, i want to make sure, if the problem isn't caused by optimizer that I use. I use RMSprop. Should I use some different optimizer to chatbot issue? Or it isn't defined and i should try each optimizer? – Lukáš Richtarik May 12 '18 at 19:48
  • 1
    @nuric The answer you provided above doesn't work. When I put in Masking following the example above, I received the following error when compiling the model: **TypeError: Input layers to a `Model` must be `InputLayer` objects. Received inputs: [, ]. Input 0 (0-based) originates from layer type `Masking`.** Any idea how to fix this? Thanks. – x112341 May 29 '18 at 03:08
  • I'm not a Keras expert, but according to another thread (https://stackoverflow.com/a/47060797/3063422): "If there's a recurrent layer with return_sequences=False, the mask stop propagates". The encoder has return_sequences=False, which means the mask you applied to the embedding doesn't really work. – Yoel Zeldes Mar 12 '19 at 13:19