1

I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.

In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:

l_input = Input(shape=(None,), dtype="int32", name=input_name)

However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.

l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)

The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.

In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.

Here is a simplified version of the model

l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs  = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)

# Sequential output
l_out1  = TimeDistributed(Dense(len(labels.keys()),
                                activation="softmax"))(l_blstm)


# Global output
conv1  = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1  = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))

conv2  = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2  = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))

conv   = Concatenate()( [conv1,conv2] )
conv   = Dense(50, activation="relu")(conv)

l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)

model  = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()

model.compile(optimizer=optimizer,
              loss="categorical_crossentropy",
              metrics=["accuracy"])

I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.

Thanks

Gabriel M
  • 1,486
  • 4
  • 17
  • 25
  • It is not clear how your model is structured, i.e. how the layers are connected, so it is hard to say anything. Please provide more information. – today Dec 01 '18 at 08:18
  • thanks! I have edited to include a simplified version of the model that reproduces the problem – Gabriel M Dec 01 '18 at 12:48
  • If the answer resolved your issue, kindly *accept* it by clicking on the checkmark (✔) next to the answer to mark it as "answered" - see [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – today Dec 15 '18 at 06:57

1 Answers1

1

Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:

# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)

conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)

gpool = GlobalAveragePooling1D()(conv2)

x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)

model  = Model(inputs=l_input, outputs=[l_out1, l_out2])

You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.

As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.

today
  • 32,602
  • 8
  • 95
  • 115