13

With the following code:

model = Sequential()

num_features = data.shape[2]
num_samples = data.shape[1]

model.add(
    LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh'))
model.add(PReLU())
model.add(Dropout(0.5))
model.add(LSTM(8, return_sequences=True, activation='tanh'))
model.add(Dropout(0.1))
model.add(PReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

I'm trying to understand how can I add an attention mechanism before the first LSTM layer. I've found the following GitHub: keras-attention-mechanism by Philippe Rémy but couldn't figure out how exactly to use it with my code.

I would like to visualize the attention mechanism and see what are the features that the model focus on.

Any help would be appreciated, especially a code modification. Thanks :)

Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
  • here a simple way to add attention: https://stackoverflow.com/questions/62948332/how-to-add-attention-layer-to-a-bi-lstm/62949137#62949137 – Marco Cerliani Jul 17 '20 at 14:58

2 Answers2

6

You may find an example of how to use a LSTM with an activation mechanism in Keras in this gist

https://gist.github.com/mbollmann/ccc735366221e4dba9f89d2aab86da1e

And in the following answer on SO:

How to add an attention mechanism in keras?

And to visualize your activations you can use the following repository https://github.com/philipperemy/keras-activations

desertnaut
  • 57,590
  • 26
  • 140
  • 166
hzitoun
  • 5,492
  • 1
  • 36
  • 43
1

There are at least half a dozen major flavours of attention, most of them are minor variations over the first Attention model that came out - Bahdanau et al in 2014. Each of this flavour can be implemented in multiple ways, so this can be confusing to someone who wants to add a simple attention layer to her/his model. Looking at your model, I would recommend adding an attention layer after your second LSTM layer. This can be a custom attention layer based on Bahdanau.

An implementation is shared here: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

You could then use the 'context' returned by this layer to(better) predict whatever you want to predict. So basically your subsequent layer (the Dense sigmoid one) would use this context to predict more accurately.

The attention weights too are returned by the above layer. These can be routed to a simple display.

For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e

Allohvk
  • 915
  • 8
  • 14