2

I trained a Many-to-Many sequence model in Keras with return_sequences=True and TimeDistributed wrapper on the last Dense layer:

model = Sequential()
model.add(Embedding(input_dim=vocab_size, output_dim=50))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(vocab_size, activation='softmax')))
# train...
model.save_weights("weights.h5")

enter image description here

So during the training the loss is calculated over all hidden states (in every timestamp). But for inference I only need the get output on the last timestamp. So I load the weights into Many-to-One sequence model for inference without TimeDistributed wrapper and I set return_sequences=False to get only last output of the LSTM layer:

inference_model = Sequential()
inference_model.add(Embedding(input_dim=vocab_size, output_dim=50))
inference_model.add(LSTM(100, return_sequences=False))
inference_model.add(Dense(vocab_size, activation='softmax'))

inference_model.load_weights("weights.h5")

When I test my inference model on a sequence with length 20 I expect to get a prediction with shape (vocab_size) but inference_model.predict(...) still returns predictions for every timestamp - a tensor of shape (20, vocab_size)

today
  • 32,602
  • 8
  • 95
  • 115
nidomo
  • 103
  • 2
  • 7
  • As already stated in this [answer](https://stackoverflow.com/a/52092176/2099607), `TimeDistributed(Dense(...))` and `Dense(...)` are equivalent, since `Dense` layer is applied on the last dimension of its input Tensor. – today Mar 07 '19 at 14:24
  • Are you sure you're not confusing "samples" (dimension 0) with "time steps" ( dimension 1)? – Daniel Möller Mar 07 '19 at 14:27

1 Answers1

2

If, for whatever reason, you need only the last timestep during inference, you can build a new model which applies the trained model on the input and returns the last timestep as its output using the Lambda layer:

from keras.models import Model
from keras.layers import Input, Lambda

inp = Input(shape=put_the_input_shape_here)
x = model(inp) # apply trained model on the input
out = Lambda(lambda x: x[:,-1])(x)

inference_model = Model(inp, out)

Side Note: As already stated in this answer, TimeDistributed(Dense(...)) and Dense(...) are equivalent, since Dense layer is applied on the last dimension of its input Tensor. Hence, that's why you get the same output shape.

today
  • 32,602
  • 8
  • 95
  • 115
  • Oh. Is there a way to apply TimeDistributed(Dense(...)) to every timestamp of LSTM output? – nidomo Mar 07 '19 at 15:22
  • @nidomo Well, I am not sure what you mean exactly as it is already applied on all the timesteps. – today Mar 07 '19 at 17:12