Differences between different attention layers for Keras

Question

I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).

model = Sequential()
model.add(Embedding(max_features, 32, input_length=maxlen))
model.add(Bidirectional(CuDNNGRU(16,return_sequences=True)))
##### add attention layer here #####
model.add(Dense(1, activation='sigmoid'))

After some searching, I found a couple of read-to-use attention layers for keras. There is the keras.layers.Attention layer that is built-in in Keras. There is also the SeqWeightedAttention and SeqSelfAttention layer in the keras-self-attention package. As a person who is relatively new to the deep learning field, I have a hard time understanding the mechanism behind these layers.

What does each of these lays do? Which one will be the best for my model?

Thank you very much!

Allohvk · Answer 1 · 2020-11-30T07:19:43.923

If you are using RNN, I would not recommend using the keras.layers.Attention class.

While analysing tf.keras.layers.Attention Github code to better understand how to use the same, the first line I could come across was - "This class is suitable for Dense or CNN networks, and not for RNN networks"

There is another open source version maintained by CyberZHG called keras-self-attention. To the best of my knowledge this is NOT a part of the Keras or TensorFlow library and seems to be an independent piece of code. This contains the two classes you mentioned - SeqWeightedAttention & SeqSelfAttention layer classes. former returns a 2D value and latter a 3D value. So the SeqWeightedAttention should work for your situation. The former seems to be loosely based on Raffel et al and can be used for Seq classification, The latter seems to be a variation of Bahdanau.

In general, I would suggest you to write your own seq to classification model. The attention piece can be added in less than half a dozen lines of code (bare-bones essence)...much less than the time you would spend in integrating or debugging or understanding the code in these external libraries.

Please refer: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

It would be better if you can address the OP's question by also explaining the purpose for each layer too. — kennysliding, Nov 16 '20 at 07:59

Differences between different attention layers for Keras

1 Answers1