1

I have this model with dot product attention layer. I have commented out the part in the code. How do I use self-attention instead of the attention layer I have ? So, basically, I want to replace the commented part with self attention layer.

I am open to keras-self-attention or a manually added layer. Anything that works

# Encoder
encoder_inputs = Input(shape=(max_text_len, ))

# Embedding layer
enc_emb = Embedding(x_voc, embedding_dim,
                    trainable=True)(encoder_inputs)

# Encoder LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1) = encoder_lstm1(enc_emb)

# Encoder LSTM 2
encoder_lstm2 = Bidirectional(LSTM(latent_dim, return_sequences=True,
                     return_state=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_output2, forward_h2, forward_c2, backward_h2, backward_c2) = encoder_lstm2(encoder_output1)

# Encoder LSTM 3
encoder_lstm3 = Bidirectional(LSTM(latent_dim, return_state=True,
                     return_sequences=True, dropout=0.4,
                     recurrent_dropout=0.4))
(encoder_outputs, forward_h, forward_c, backward_h, backward_c) = encoder_lstm3(encoder_output2)

state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])

# Set up the decoder, using encoder_states as the initial state
decoder_inputs = Input(shape=(None, ))

# Embedding layer
dec_emb_layer = Embedding(y_voc, embedding_dim, trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)


# Decoder LSTM
decoder_lstm = LSTM(latent_dim*2, return_sequences=True,
                    return_state=True, dropout=0.4,
                    recurrent_dropout=0.2)
(decoder_outputs, decoder_fwd_state, decoder_back_state) = \
    decoder_lstm(dec_emb, initial_state=[state_h, state_c])

#Start attention layer
# attention = dot([decoder_outputs, encoder_outputs], axes=[2, 2])
# attention = Activation('softmax')(attention)
# context = dot([attention, encoder_outputs], axes=[2,1])
# decoder_outputs = Concatenate()([context, decoder_outputs])
#End attention layer

# Dense layer
decoder_dense = TimeDistributed(Dense(y_voc, activation='softmax'))(decoder_outputs)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_dense)
BlueMango
  • 463
  • 7
  • 21
  • 1
    Use `tf.keras.layers.Attention` by setting `causal` argument as `True` to make it self-attention layer. casual expects `Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.`. Take a look at https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention. Thanks! –  Jul 28 '21 at 09:01
  • Thank you. I did `attention = Attention(causal = True)([encoder_outputs,decoder_outputs])` and the model fits successfully. However, even with this model I get the same error I raised in this [separate question](https://stackoverflow.com/questions/68444781/keras-model-trains-successfully-but-generating-predictions-gives-valueerror-gr). Maybe you have an idea? Would help me a lot. Thanks! – BlueMango Jul 30 '21 at 11:03
  • I actually created a new question [here](https://stackoverflow.com/questions/68590104/graph-disconnected-cannot-obtain-value-for-tensor-kerastensor-in-inference-mode) – BlueMango Jul 30 '21 at 11:21

0 Answers0