Keras - "Convert" a trained many-to-many model to one-to-many model (generator)

Question

I'm trying to understand RNNs (not a specific one) with the Reber Grammar inputs (not embedded for now). You can find the jupyter notebook on this link (please disregard markdowns because I failed on the first version with output and it's not up-to-date :) ).

For every timestep, I provide the input and expected output for the training (so it's a many-to-many model).

Input/output are "OneHotEncoded" (based on the string "BTSXPVE") so for example
- B is [1, 0, 0, 0, 0, 0, 0]
- V is [0, 0, 0, 0, 0, 1, 0]
For the timesteps, I have string with unknown lenght (not encoded here to make it clearer) for example:
- BPVVE
- BPVPXVPXVPXVVE

so I decided to pad them to 20 timesteps.

For the batch, I'm free. I've generated 2048 encoded strings for training and 256 for test.

So my input tensor is (2048, 20, 7). My output tensor is also (2048, 20, 7) because for every timestep I would like to get the prediction.

So I trained 3 many-to-many models (Simple RNN, GRU and LSTM) like the following code.

model = Sequential()

model.add(LSTM(units=7, input_shape=(maxlen, 7), return_sequences=True))
model.compile(loss='mse',
              optimizer='Nadam',
              metrics=['mean_squared_error'])

history = model.fit(X_train, y_train, validation_data=(X_test, y_test), 
                    epochs=1500, batch_size=1024)

As expected, for every timestep, I have the probability to get a specific value, for example (after a bit of cleanup) :

B predict [ 0, 0.622, 0, 0, 0.401, 0, 0] (60% of having a T or 40% of having P )

This is correct based on the graph to generate a word

Now, I would like to use this model to generate string (so a One-to-many model) but I have no idea how to keep the model and use it as generator.

I thought to input only the input for B (padded to 20 timesteps), get the result, concatenate the B with the best index of the output, pad it to 20 timesteps, feed the need input to the NN and so on. But I'm pretty sure this is not the way we should do it :s

Moreover, I tried to input 'B' and 'T' to check what is the probability of output (should be S or X) but I got :

X = np.array([[[1,0,0,0,0,0,0], [0,1,0,0,0,0,0]]])  # [[[B, P]]]
X = sequence.pad_sequences(X, maxlen=20)
print(model.predict(X)[0])

[0, 0.106, 0.587, 0.1, 0, 0.171, 0.007]

What I understand is that is predit that T(10%), S(60%), X(10%), V (18%) but after BT, I should get more percent on X and nearly none on V/T (because V and T after a T is only possible after B/P). It's like if my model didn't take in account the n-1 timesteps. So maybe my model is wrong :(

Many thanks for your support,

Exactly, I shift it by 1 step and add the "E" as final output (to keep the same size) — Nicolas M., Oct 27 '17 at 17:17
About the precision of the model, maybe it's just too tiny for understanding those sequences. Add more layers, you can for instance have one (or more) first layer with more units, then this last layer with 7 units. — Daniel Möller, Oct 27 '17 at 17:43
I'll try that but I have doubt about the result because the NN "creates" entropy. I have a binary input and get a "probability" as output. If I feed that to another layer, it will create noise from this noise I guess but I'll try for learning purpose — Nicolas M., Oct 27 '17 at 18:00
I mean a single model with two or more layers. This makes the model "smarter" (it takes longer to learn, but it learns better). Having too many layers or units may make your model too smart though (it memorizes the training data, but can't understand the test data - this is called overfitting). — Daniel Möller, Oct 27 '17 at 18:21

Daniel Möller · Accepted Answer · 2017-10-27T18:49:40.057

1

You can remake this model as a stateful=True model. Make it work with timesteps=1 (or None for variable length).

Remaking the model:

newModel = Sequential()

newModel.add(LSTM(units=7, stateful=True,batch_input_shape=(1,1,7), return_sequences=True))

Getting the weights from the other model:

newModel.set_weights(model.get_weights())

Using the model in predictions:

Now, with this model, you must input only one step at once. And you must be careful to reset_states() every time you're going to input a new sequence:

So, suppose we've got the starting letter B.

startingLetter = oneHotForBWithShape((1,1,7))


#we are starting a new "sentence", so, let's reset states:
newModel.reset_states()

#now the prediction loop:
nextLetter = startingLetter
while nextLetter != endLetter:
    nextLetter = newModel.predict(nextLetter)
    nextLetter = chooseOneFromTheProbabilities(nextLetter)

About the quality of the results.... maybe your model is just too tiny for that.

You cay try more layers, for instance:

model = Sequential()

model.add(LSTM(units=50, input_shape=(maxlen, 7), return_sequences=True))
model.add(LSTM(units=30, return_sequences=True))
model.add(LSTM(units=7, return_sequences=True))

This choice was arbitrary, I don't know if it's good enough or too good for your data.

edited Oct 27 '17 at 18:49

answered Oct 27 '17 at 17:26

Daniel Möller

84,878
18
192
214

Awesome the possibility to set weight from a model to another "different" one !!! Thanks a lot :) – Nicolas M. Oct 27 '17 at 17:36
Yes :) -- The secret is that LSTMs simply don't care about the length of the sequences. The length gets important inside the "states", but not in the "weights". As long as the two models have the same number of layers, with the same number of units and input features, they're the same model. – Daniel Möller Oct 27 '17 at 17:40
I was a bit afraid when I saw your reply but it also work with GRU and SimpleRNN so now I'll be able to compare them – Nicolas M. Oct 27 '17 at 17:46
By the way.... do you know any page teaching to make a notebook like that of yours on github? I'm going to start using it. – Daniel Möller Oct 27 '17 at 17:49
1

You have to do your notebook on your computer and push it to github when needed (for example to show it to someone or to "store" it). When it's done, I just push it to my git (by usign git command or tool like git gui). If you want to install jupyter notebook you can follow (http://jupyter.readthedocs.io/en/latest/install.html), to use git, there is several tutorial but I used https://www.youtube.com/watch?v=BCQHnlnPusY&list=PLRqwX-V7Uu6ZF9C0YMKuns9sLDzK6zoiV – Nicolas M. Oct 27 '17 at 17:58

Keras - "Convert" a trained many-to-many model to one-to-many model (generator)

1 Answers1

Linked