1

I am deploying a Bidirectional LSTM Autoencoder, and am adding attention layer on top of that.

Before adding attention layer it is working fine. I got the idea from this post for adding attention layer. After adding attention it complains about the dimension incompatibility.

This is my code after adding attention:

inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE, return_sequences=True), name="encoder_lstm")(inputs)
attention = Dense(SEQUENCE_LEN, activation='tanh')(encoded)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(SEQUENCE_LEN)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([encoded, attention], mode='mul')
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-2), output_shape=(units,))(sent_representation)
autoencoder = Model(inputs, sent_representation)
autoencoder.compile(optimizer="sgd", loss='mse')

this is the error I got:

Using TensorFlow backend.
(?, 40, 50)
(?, 40, 40)
Traceback (most recent call last):
(?, 40, 40)
  File "/home/sgnbx/Downloads/projects/LSTM_autoencoder-master/walkingaround.py", line 131, in <module>
    sent_representation = merge([activations, attention], mode='mul')
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.4/site-packages/keras/engine/topology.py", line 470, in __call__
    self.assert_input_compatibility(x)
  File "/home/sgnbx/anaconda3/envs/tf_gpu/lib/python3.4/site-packages/keras/engine/topology.py", line 411, in assert_input_compatibility
    str(K.ndim(x)))
Exception: Input 0 is incompatible with layer dense_1: expected ndim=2, found ndim=3

I have read a couple of post regarding this error, namely: this and this and this. but they are not the same as my error. Also, Some suggested to make return_sequences=False, but I do not think this is the correct way. Later in the code, it again raises an error if we set it False!

So, I feel like I am doing something wrong, otherwise, why the network should raise the error with the standard architecture.

So my question is that: what is wrong with this network and how can I fix it.

I appreciate it if you could explain in detail so I can grasp better or give me some links which talk about the conflict in my code.

Thanks in advance!

sariii
  • 2,020
  • 6
  • 29
  • 57

1 Answers1

1

Correct the line below:

encoded = Bidirectional(LSTM(LATENT_SIZE, return_sequences=False), name="encoder_lstm")(inputs)

Just make the return sequence False.

Panciz
  • 2,183
  • 2
  • 30
  • 54