0

I have been looking into an implementation of a certain architecture of deep learning model in keras when I came across a technicality that I could not grasp. In the code, the model is implemented as having two inputs; the first is the normal input that goes through the graph (word_ids in the sample code below), while the second is the length of that input, which seems to be involved nowhere other than the inputs argument in the keras Model instant (sequence_lengths in the sample code below).

word_ids = Input(batch_shape=(None, None), dtype='int32')
word_embeddings = Embedding(input_dim=embeddings.shape[0],
                            output_dim=embeddings.shape[1],
                            mask_zero=True,
                            weights=[embeddings])(word_ids)
x = Bidirectional(LSTM(units=64, return_sequences=True))(word_embeddings)
x = Dense(64, activation='tanh')(x)
x = Dense(10)(x)
sequence_lengths = Input(batch_shape=(None, 1), dtype='int32')
model = Model(inputs=[word_ids, sequence_lengths], outputs=[x])

I think this is done to make the network accept a sequence of any length. My questions are as follow:

  1. Is what I think correct?
  2. If yes, then, I feel like there is a bit of magic going on under the hood. Any suggestions on how to wrap one's head around this?
  3. Does this mean that using this method, one doesn't need to pad his sequences (neither in training nor in prediction), and that keras will somehow know how to pad them automatically?
cad86
  • 125
  • 8

1 Answers1

2

Do you need to pass sequence_lengths as an input?

No, it's absolutely not necessary to pass the sequence lengths as inputs, either if you're working with fixed or with variable length sequences.

I honestly don't understand why that model in the code uses this input if it's not sent to any of the model layers to be processed.

Is this really the complete model?

Why would one pass the sequence lengths as an input?

Well, maybe they want to perform some custom calculations with those. It might be an interesting option, but none of these calculations are present (or shown) in the code you posted. This model is doing absolutely nothing with this input.

How to work with variable sequence length?

For that, you've got two options:

  • Pad the sequences, as you mentioned, to a fixed size, and add Masking layers to the input (or use the mask_zeros=True option in the embedding layer).
  • Use the length dimension as None. This is done with one of these:
    • batch_shape=(batch_size, None)
    • input_shape=(None,)
    • PS: these shapes are for Embedding layers. An input that goes directly into recurrent networks would have an additional last dimension for input features

When using the second option (length = None), you should process each batch separately, because you are not able to put all sequences with different lengths in the same numpy array. But there is no limitation in the model itself, and no padding is necessary in this case.

How to work with "unlimited" length

The only way to work with unlimited length is using stateful=True.

In this case, every batch you pass will not be seen as "another group of sequences", but "additional steps of the previous batch".

Community
  • 1
  • 1
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Thanks for the detailed answer. I am sorry, I did some edits on the network to make it more readable, but this x has slipped my attention. The model can be found [here](https://github.com/Hironsan/anago/blob/master/anago/models.py) in the SeqLabeling class. I have edited my answer above to reflect this. I guess if you go through the model in the referenced github repo, you won't change your comment at all. Anyway, regarding the last point about `stateful=True`, does this still apply if we are talking about sequence tagging? – cad86 Nov 26 '17 at 16:10
  • by sequence tagging, I mean something similar to part-of-speech tagging task or something similar. – cad86 Nov 26 '17 at 16:11
  • I don't know exactly what sequence tagging is. But `stateful=True` is a method to input new "steps" in separate batches. (In contrast to `stateful=False` which is for inputting new "sequences" in separate batches) --- See more about stateful vs non stateful here: https://stackoverflow.com/questions/43882796/when-does-keras-reset-an-lstm-state – Daniel Möller Nov 26 '17 at 16:13
  • Yes.... I don't see in that file a reason for that input. If that file is all, then it's useless. That might become useful only if somewhere they decide to stack another model over that and actually use that input for something. – Daniel Möller Nov 26 '17 at 17:06