Tensorflow seq2seq chatbot always give the same outputs

Question

I'm trying to make a seq2seq chatbot with Tensorflow, but it seems to converge to the same outputs despite different inputs. The model gives different outputs when first initialized, but quickly converges to the same outputs after a few epochs. This is still an issue even after a lot of epochs and low costs. However, the models seems to do fine when trained with smaller datasets (say 20) but it fails with larger ones.

I'm training on the Cornell Movie Dialogs Corpus with a 100-dimensional and 50000-vocab glove pretrained embedding.

The encoder seems to have very close final states (in the ranges of around 0.01) when given totally different inputs. I've tried using a simple LSTM/GRU, bidirectional LSTM/GRU, multi-layer/stacked LSTM/GRU, and multi-layer bidirection LSTM/GRU. The rnn nodes have been tested with from 16 to 2048 hidden units. The only difference is that the model tends to output only the start and end tokens (GO and EOS) when having lesser hidden units.

For multi-layer GRU, here's my code:

cell_encode_0 = tf.contrib.rnn.GRUCell(self.n_hidden)
cell_encode_1 = tf.contrib.rnn.GRUCell(self.n_hidden)
cell_encode_2 = tf.contrib.rnn.GRUCell(self.n_hidden)
self.cell_encode = tf.contrib.rnn.MultiRNNCell([cell_encode_0, cell_encode_1, cell_encode_2])
# identical decoder

...

embedded_x = tf.nn.embedding_lookup(self.word_embedding, self.x)
embedded_y = tf.nn.embedding_lookup(self.word_embedding, self.y)

_, self.encoder_state = tf.nn.dynamic_rnn(
    self.cell_encode,
    inputs=embedded_x,
    dtype=tf.float32,
    sequence_length=self.x_length
    )

# decoder for training
helper = tf.contrib.seq2seq.TrainingHelper(
    inputs=embedded_y,
    sequence_length=self.y_length
    )

decoder = tf.contrib.seq2seq.BasicDecoder(
    self.cell_decode,
    helper,
    self.encoder_state,
    output_layer=self.projection_layer
    )

outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, maximum_iterations=self.max_sequence, swap_memory=True)

return outputs.rnn_output

...

# Optimization
dynamic_max_sequence = tf.reduce_max(self.y_length)
mask = tf.sequence_mask(self.y_length, maxlen=dynamic_max_sequence, dtype=tf.float32)
crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=self.y[:, :dynamic_max_sequence], logits=self.network())
self.cost = (tf.reduce_sum(crossent * mask) / batch_size)
self.train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.cost)

For the full code, please see github. (In case if you want to test it out, run train.py)

As for hyper-parameters, I've tried learning rates from 0.1 all the way to 0.0001 and batch sizes from 1 to 32. Other than the regular and expected effects, they do not help with the problem.

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

After digging around for months, I've finally found the issue. It seems that the RNN requires GO tokens in decoder inputs but not outputs (the one you use for cost). Basically, the RNN expects its data as the following:

Encoder input: GO foo foo foo EOS

Decoder input/ground truth: GO bar bar bar EOS

Decoder output: bar bar bar EOS EOS/PAD

In my code, I included the GO token in both decoder input and output, causing the RNN to repeat the same tokens (GO -> GO, bar -> bar). This can be easily fixed by creating an additional variable that does not have the first column (GO tokens) of the ground truth. In numpy, this looks something like

# y is ground truth with shape[0] = batch and shape[1] = token index
np.concatenate([y[:, 1:], np.full([y.shape[0], 1], EOS)], axis=1)

Tensorflow seq2seq chatbot always give the same outputs

1 Answers1

Linked