Keras - Add attention mechanism to an LSTM model

Question

With the following code:

model = Sequential()

num_features = data.shape[2]
num_samples = data.shape[1]

model.add(
    LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh'))
model.add(PReLU())
model.add(Dropout(0.5))
model.add(LSTM(8, return_sequences=True, activation='tanh'))
model.add(Dropout(0.1))
model.add(PReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

I'm trying to understand how can I add an attention mechanism before the first LSTM layer. I've found the following GitHub: keras-attention-mechanism by Philippe Rémy but couldn't figure out how exactly to use it with my code.

I would like to visualize the attention mechanism and see what are the features that the model focus on.

Any help would be appreciated, especially a code modification. Thanks :)

here a simple way to add attention: https://stackoverflow.com/questions/62948332/how-to-add-attention-layer-to-a-bi-lstm/62949137#62949137 — Marco Cerliani, Jul 17 '20 at 14:58

score 6 · Answer 1 · edited Nov 24 '20 at 12:31

6

You may find an example of how to use a LSTM with an activation mechanism in Keras in this gist

https://gist.github.com/mbollmann/ccc735366221e4dba9f89d2aab86da1e

And in the following answer on SO:

How to add an attention mechanism in keras?

And to visualize your activations you can use the following repository https://github.com/philipperemy/keras-activations

edited Nov 24 '20 at 12:31

desertnaut

57,590
26
140
166

answered Nov 05 '18 at 09:30

hzitoun

5,492
1
36
43

Thanks for your reply, how would you visualize the output as seen on the GitHub repository I've shared? – Shlomi Schwartz Nov 05 '18 at 09:41
thanks for your help, However, I'm looking for more points of view for my use-case :) – Shlomi Schwartz Nov 06 '18 at 13:41

Allohvk · Answer 2 · 2020-12-03T05:08:17.010

There are at least half a dozen major flavours of attention, most of them are minor variations over the first Attention model that came out - Bahdanau et al in 2014. Each of this flavour can be implemented in multiple ways, so this can be confusing to someone who wants to add a simple attention layer to her/his model. Looking at your model, I would recommend adding an attention layer after your second LSTM layer. This can be a custom attention layer based on Bahdanau.

An implementation is shared here: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

You could then use the 'context' returned by this layer to(better) predict whatever you want to predict. So basically your subsequent layer (the Dense sigmoid one) would use this context to predict more accurately.

The attention weights too are returned by the above layer. These can be routed to a simple display.

For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

Keras - Add attention mechanism to an LSTM model

2 Answers2