Dimensions between embedding layer and lstm encoder layer don't match

Question

I am trying to build an encoder-decoder model for text generation. I am using LSTM layers with an embedding layer. I have somehow a problem with the output of the embedding layer to the LSTM encoder layer. The error I get is:

 ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 13, 128, 512)

My encoder data has shape: (40, 13, 128) = (num_observations, max_encoder_seq_length, vocab_size) the embeddings_size/latent_dim = 512.

My questions are: how could I get "rid" of this 4'th dimension from the embeddings layer to the LSTM encoder layer, or in other words: how should I pass those 4 dimensions to the LSTM layer of the encoder model ? As I am new to this topic, what should I also eventually correct in the decoder LSTM layer ?

I have read at several posts including this, and this one and many others but couldn't find a solution. It seems to me that my problem is not in the model rather in the shape of the data. Any hint or remark with respect to what could potentially be wrong would be more than appreciated. Thank you very much

My model is the following from (this tutorial):

encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

# Compile & run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# Note that `decoder_target_data` needs to be one-hot encoded,
# rather than sequences of integers like `decoder_input_data`!
model.fit([encoder_input_data, decoder_input_data],
          decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          shuffle=True,
          validation_split=0.05)

The summary of my model is:

Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 13)]         0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 15)]         0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 13, 512)      65536       input_1[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 15, 512)      65536       input_2[0][0]                    
__________________________________________________________________________________________________
lstm (LSTM)                     [(None, 512), (None, 2099200     embedding[0][0]                  
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 15, 512)      2099200     embedding_1[0][0]                
                                                                 lstm[0][1]                       
                                                                 lstm[0][2]                       
__________________________________________________________________________________________________
dense (Dense)                   (None, 15, 128)      65664       lstm_1[0][0]                     
==================================================================================================
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0
__________________________________________________________________________________________________

Edit

I am formatting my data in the following way:

for i, text, in enumerate(input_texts):
    words = text.split() #text is a sentence 
    for t, word in enumerate(words):
        encoder_input_data[i, t, input_dict[word]] = 1.

Which gives for such command decoder_input_data[:2]:

array([[[0., 1., 0., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]],
       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)

Please add values for the shapes that are specified in the architecture and also, please provide full trace of the error. Which line in code is causing the error? — Akshay Sehgal, Dec 28 '20 at 10:20

Akshay Sehgal · Accepted Answer · 2020-12-28T10:36:52.273

I am not sure what you are passing to the mode as inputs and outputs, but this is what works. Please note the shapes of the encoder and decoder inputs I am passing. Your inputs need to be in that shape for the model to run.

### INITIAL CONFIGURATION
num_observations = 40
max_encoder_seq_length = 13
max_decoder_seq_length = 15
num_encoder_tokens = 128
num_decoder_tokens = 128
latent_dim = 512
batch_size = 256
epochs = 5

### MODEL DEFINITION
encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.summary()

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')


### MODEL INPUT AND OUTPUT SHAPES
encoder_input_data = np.random.random((1000,13))
decoder_input_data = np.random.random((1000,15))
decoder_target_data = np.random.random((1000, 15, 128))

model.fit([encoder_input_data, decoder_input_data],
          decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          shuffle=True,
          validation_split=0.05)

Model: "functional_210"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_176 (InputLayer)          [(None, 13)]         0                                            
__________________________________________________________________________________________________
input_177 (InputLayer)          [(None, 15)]         0                                            
__________________________________________________________________________________________________
embedding_33 (Embedding)        (None, 13, 512)      65536       input_176[0][0]                  
__________________________________________________________________________________________________
embedding_34 (Embedding)        (None, 15, 512)      65536       input_177[0][0]                  
__________________________________________________________________________________________________
lstm_94 (LSTM)                  [(None, 512), (None, 2099200     embedding_33[0][0]               
__________________________________________________________________________________________________
lstm_95 (LSTM)                  (None, 15, 512)      2099200     embedding_34[0][0]               
                                                                 lstm_94[0][1]                    
                                                                 lstm_94[0][2]                    
__________________________________________________________________________________________________
dense_95 (Dense)                (None, 15, 128)      65664       lstm_95[0][0]                    
==================================================================================================
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0
__________________________________________________________________________________________________

Epoch 1/5
4/4 [==============================] - 3s 853ms/step - loss: 310.7389 - val_loss: 310.3570
Epoch 2/5
4/4 [==============================] - 3s 638ms/step - loss: 310.6186 - val_loss: 310.3362
Epoch 3/5
4/4 [==============================] - 3s 852ms/step - loss: 310.6126 - val_loss: 310.3345
Epoch 4/5
4/4 [==============================] - 3s 797ms/step - loss: 310.6111 - val_loss: 310.3369
Epoch 5/5
4/4 [==============================] - 3s 872ms/step - loss: 310.6117 - val_loss: 310.3352

The sequence data (text) needs to be passed to the inputs as label encoded sequences. This needs to be done by using something like textvectorizer from keras. Please read more about how to prepare text data for embedding layers and lstms here.

Thank you very much for your response. The problem is definitely coming from the formatting of my data. Please see my edited question for the way I am formatting it. I will have a look at the link you have provided. I will accept the answer as soon as I get my model working. Again, thanks a lot. According to your link, it makes no sense here to do binary classification. I need to modify this — Joachim, Dec 28 '20 at 11:13
If I may ask, how should I do the encoding the of for the decoder_target_data ? I am asking this question because my understanding of what is explained in your article is only valid for the encoder input_data and the decoder_input_data (TextVectorization(.... output_mode='int'),which are 2 dimensional vectors. But the decoder_target_data is in 3 dimensions, so is it also possible to use TextVectorization ? If yes, how should it be done ? Thank you very much for your response — Joachim, Dec 28 '20 at 12:57
encoder decoder both have the same vocabulary, so they both need to be label encoded with the same vocabulary as part of the embedding layers. Thats why i used 128 as the vocab_size for both of them. — Akshay Sehgal, Dec 28 '20 at 13:22

Dimensions between embedding layer and lstm encoder layer don't match

1 Answers1