0

I am currently trying to include an embedding layer to my sequence-to-sequence autoencoder, built with the keras functional API.

The model code looks like this:

#Encoder inputs
encoder_inputs = Input(shape=(None,))

#Embedding
embedding_layer = Embedding(input_dim=n_tokens, output_dim=2)
encoder_embedded = embedding_layer(encoder_inputs)

#Encoder LSTM
encoder_outputs, state_h, state_c = LSTM(n_hidden, return_state=True)(encoder_embedded)
lstm_states = [state_h, state_c]


#Decoder Inputs
decoder_inputs = Input(shape=(None,)) 

#Embedding
decoder_embedded = embedding_layer(decoder_inputs)

#Decoder LSTM
decoder_lstm = LSTM(n_hidden, return_sequences=True, return_state=True, )
decoder_outputs, _, _ = decoder_lstm(decoder_embedded, initial_state=lstm_states)


#Dense + Time
decoder_dense = TimeDistributed(Dense(n_tokens, activation='softmax'), input_shape=(None, None, 256))
#decoder_dense = Dense(n_tokens, activation='softmax', )
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

The model is trained like this:

model.fit([X, y], X, epochs=n_epoch, batch_size=n_batch)

with X and y having a shape (n_samples, n_seq_len)

The compiling of the model works flawless, while when trying to train, I will always get:

ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (n_samples, n_seq_len)

Does anybody have an idea?

Keras Version is 2.2.4
Tensorflow backend version 1.12.0

1 Answers1

0

In such an autoencoder, since the last layer is a softmax classifier you need to one-hot encode the labels:

from keras.utils import to_categorical

one_hot_X = to_categorical(X)

model.fit([X, y], one_hot_X, ...)

As a side note, since the Dense layer is applied on the last axis, there is no need to wrap the Dense layer in TimeDistributed layer.

today
  • 32,602
  • 8
  • 95
  • 115
  • Hi today, thanks for your answer, I understand your point with the need for one-hot encoding. The problem is that I have a lot of different labels and try to avoid one-hot encoding when I can. Do you now how I can change the layout of the model so that a smaller (embedding?) vector gets predicted? – BorisMirheiss Dec 17 '18 at 09:59
  • @BorisMirheiss If your problem is that you don't want to one-hot encode all the labels beforehand (because of RAM constraints) then you can write a generator that generates one-hot encoded labels on the fly. – today Dec 17 '18 at 14:04