Keras sequence models - how to generate data during test/generation?

Question

Is there a way to use the already trained RNN (SimpleRNN or LSTM) model to generate new sequences in Keras?

I'm trying to modify an exercise from the Coursera Deep Learning Specialization - Sequence Models course, where you train an RNN to generate dinosaurus's names. In the exercise you build the RNN using only numpy, but I want to use Keras.

One of the problems is different lengths of the sequences (dino names), so I used padding and set sequence length to the max size appearing in the dataset (I padded with 0, which is also the code for '\n').

My question is how to generate the actual sequence once training is done? In the numpy version of the exercise you take the softmax output of the previous cell and use it as a distribution to sample a new input for the next cell. But is there a way to connect the output of the previous cell as the input of the next cell in Keras, during testing/generation time?

Also - some additional side-question:

Since I'm using padding, I suspect the accuracy is way too optimistic. Is there a way to tell Keras not to include the padding values in its accuracy calculations?

Am I even doing this right? Is there a better way to use Keras with sequences of different lengths?

You can check my (WIP) code here.

thushv89 · Accepted Answer · 2019-11-19T20:22:26.700

Inferring from a model that has been trained on a sequence

So it's a pretty common thing to do in RNN models and in Keras the best way (at least from what I know) is to create two different models.

One model for training (which uses sequences instead of individual items)
Another model for predicting (which uses a single element instead of a sequence)

So let's see an example. Suppose you have the following model.

from tensorflow.keras import models, layers

n_chars = 26
timesteps = 10
inp = layers.Input(shape=(timesteps,  n_chars))
lstm = layers.LSTM(100, return_sequences=True)
out1 = lstm(inp)
dense = layers.Dense(n_chars, activation='softmax')
out2 = layers.TimeDistributed(dense)(out1)
model = models.Model(inp, out2)
model.summary()

Now to infer from this model, you create another model which looks like the one below.

inp_infer = layers.Input(shape=(1, n_chars))
# Inputs to feed LSTM states back in
h_inp_infer = layers.Input(shape=(100,))
c_inp_infer = layers.Input(shape=(100,))
# We need return_state=True so we are creating a new layer
lstm_infer = layers.LSTM(100, return_state=True, return_sequences=True)
out1_infer, h, c  = lstm_infer(inp_infer, initial_state=[h_inp_infer, c_inp_infer])
out2_infer = layers.TimeDistributed(dense)(out1_infer)

# Our model takes the previous states as inputs and spits out new states as outputs
model_infer = models.Model([inp_infer, h_inp_infer, c_inp_infer], [out2_infer, h, c])

# We are setting the weights from the trained model
lstm_infer.set_weights(lstm.get_weights())
model_infer.summary()

So what's different. You see that we have defined a new input layer which accepts an input which has only one timestep (or in other words, just a single item). Then the model outputs an output which has a single timestep (technically we don't need the TimeDistributedLayer. But I've kept that for consistency). Other than that we take the previous LSTM state output as an input and produces the new state as the output. More specifically we have the following inference model.

Input: [(None, 1, n_chars) (None, 100), (None, 100)] list of tensor
Output: [(None, 1, n_chars), (None, 100), (None, 100)] list of Tensor

Note that I'm updating the weights of the new layers from the trained model or using the existing layers from the training model. It will be a pretty useless model if you don't reuse the trained layers and weights.

Now we can write inference logic.

import numpy as np
x = np.random.randint(0,2,size=(1, 1, n_chars))
h = np.zeros(shape=(1, 100))
c = np.zeros(shape=(1, 100))
seq_len = 10
for _ in range(seq_len):
  print(x)
  y_pred, h, c = model_infer.predict([x, h, c])
  y_pred = x[:,0,:]
  y_onehot = np.zeros(shape=(x.shape[0],n_chars))
  y_onehot[np.arange(x.shape[0]),np.argmax(y_pred,axis=1)] = 1.0
  x = np.expand_dims(y_onehot, axis=1)

This part starts with an initial x, h, c. Gets the prediction y_pred, h, c and convert that to an input in the following lines and assign it back to x, h, c. So you keep going for n iterations of your choice.

About masking zeros

Keras does offer a Masking layer which can be used for this purpose. And the second answer in this question seems to be what you're looking for.

I get it - you create an lstm cell object, and then you train it, and in inference time you use it, so you can use it with the learned weights and all. Cool - will give it a try. — Maverick Meerkat, Nov 19 '19 at 11:28
I'm not sure, but does this model actually preserves the state from previous cells? — Maverick Meerkat, Nov 19 '19 at 15:54
Ok, so I think it doesn't. I now changed the lstm cell to return_state=True, and used it in the inference model, and the results are much much better. Will add it as a separate answer — Maverick Meerkat, Nov 19 '19 at 16:29
Though my version creates a new inference model every new step, while it does give better results, I'm not sure it is the best way to do it... — Maverick Meerkat, Nov 19 '19 at 16:52
@DavidRefaeli Good catch, I think we have to include the state as an input, because it probably won't preserve states. I'll update my code accordingly — thushv89, Nov 19 '19 at 20:02

Keras sequence models - how to generate data during test/generation?

1 Answers1

Inferring from a model that has been trained on a sequence

About masking zeros

Linked