I would like to implement the model depicted in the following picture using Keras, but I have no idea how to do it.
If the input of the model was given like (batch, max_length_sentence, max_length_of_word)
, how would I need to implement it?
I would like to implement the model depicted in the following picture using Keras, but I have no idea how to do it.
If the input of the model was given like (batch, max_length_sentence, max_length_of_word)
, how would I need to implement it?
If I understand your question correctly, each single training sample consists of multiple sentences where each sentence consists of multiple words (it seems that each training sample is sentences of a text document). The first LSTM layer processes a single sentence and then after processing all the sentences, the representation of sentences by the first LSTM layer is fed to the second LSTM layer.
To implement this architecture, you need to wrap the first LSTM layer inside a TimeDistributed
layer to allow it to process each sentence individually. Then you can simply add another LSTM layer on top to process the outputs of first LSTM layer. Here is an sketch:
lstm1_units = 128
lstm2_units = 64
max_num_sentences = 10
max_num_words = 100
emb_dim = 256
model = Sequential()
model.add(TimeDistributed(LSTM(lstm1_units), input_shape=(max_num_sentences, max_num_words, emb_dim)))
model.add(LSTM(lstm2_units, return_sequences=True))
model.summary()
Model summary:
Layer (type) Output Shape Param #
=================================================================
time_distributed_4 (TimeDist (None, 10, 128) 197120
_________________________________________________________________
lstm_6 (LSTM) (None, 10, 64) 49408
=================================================================
Total params: 246,528
Trainable params: 246,528
Non-trainable params: 0
_________________________________________________________________
As you can see, since we have used return_sequences=True
for the second LSTM layer, its output corresponding to each sentence is returned (this is in accordance with the figure in your question). Further, note that here we have assumed that the words have been represented using word vectors (i.e. word embeddings). If that's not the case, and you would like to do so, you can simply add an Embedding
layer (wrapped in a TimeDistributed
layer) as the first layer to represent the words using word embeddings and the rest would be the same.