I am working on seq2seq chatbot. I would ask you, how to ignore PAD symbols in chatbots responses while val_acc is counting.
For example, my model generates response: [I, am, reading, a, book, PAD, PAD, PAD, PAD, PAD]
But, right response should be: [My, brother, is, playing, fotball,PAD, PAD, PAD, PAD, PAD].
In this case, chatbot responded totally wrong, but val_acc is 50% because of padding symbols.
I use Keras, encoder-decoder model (https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) with teacher forcing
My code is here:
encoder_inputs = Input(shape=(sentenceLength,), name="Encoder_input")
encoder = LSTM(n_units, return_state=True, name='Encoder_lstm')
Shared_Embedding = Embedding(output_dim=embedding, input_dim=vocab_size, name="Embedding", mask_zero='True')
word_embedding_context = Shared_Embedding(encoder_inputs)
encoder_outputs, state_h, state_c = encoder(word_embedding_context)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None,), name="Decoder_input")
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")
word_embedding_answer = Shared_Embedding(decoder_inputs)
decoder_outputs, _, _ = decoder_lstm(word_embedding_answer, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax', name="Dense_layer")
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
Encoder input is sentence where each word is integer and 0 is padding: [1,2,5,4,3,0,0,0] -> User question Decoder input is also sentence where each word is integer, 0 is padding and 100 is symbol GO: [100,8,4,2,0,0,0,0,0]] ->chatbot response shifted one timestamp decoder output is sentence, where words are integers, and these integers are one hot encoded: [8,4,2,0,0,0,0,0, 0]] ->chatbot response (integers are one hot encoded.)
Problem is, that val_acc is too hight, also whan model predicts totaly wrong sentences. I think that it is caused because of paddings. Is there something wrong with my model? Should I add some another mask to my decoder?