I am trying to build an encoder-decoder model for text generation. I am using LSTM layers with an embedding layer. I have somehow a problem with the output of the embedding layer to the LSTM encoder layer. The error I get is:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 13, 128, 512)
My encoder data has shape: (40, 13, 128) = (num_observations, max_encoder_seq_length, vocab_size)
the embeddings_size/latent_dim = 512.
My questions are: how could I get "rid" of this 4'th dimension from the embeddings layer to the LSTM encoder layer, or in other words: how should I pass those 4 dimensions to the LSTM layer of the encoder model ? As I am new to this topic, what should I also eventually correct in the decoder LSTM layer ?
I have read at several posts including this, and this one and many others but couldn't find a solution. It seems to me that my problem is not in the model rather in the shape of the data. Any hint or remark with respect to what could potentially be wrong would be more than appreciated. Thank you very much
My model is the following from (this tutorial):
encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
# Compile & run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# Note that `decoder_target_data` needs to be one-hot encoded,
# rather than sequences of integers like `decoder_input_data`!
model.fit([encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
validation_split=0.05)
The summary of my model is:
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 13)] 0
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 15)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 13, 512) 65536 input_1[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 15, 512) 65536 input_2[0][0]
__________________________________________________________________________________________________
lstm (LSTM) [(None, 512), (None, 2099200 embedding[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 15, 512) 2099200 embedding_1[0][0]
lstm[0][1]
lstm[0][2]
__________________________________________________________________________________________________
dense (Dense) (None, 15, 128) 65664 lstm_1[0][0]
==================================================================================================
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0
__________________________________________________________________________________________________
Edit
I am formatting my data in the following way:
for i, text, in enumerate(input_texts):
words = text.split() #text is a sentence
for t, word in enumerate(words):
encoder_input_data[i, t, input_dict[word]] = 1.
Which gives for such command decoder_input_data[:2]
:
array([[[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)