0

I am using a keras LSTM for a regression model. I would like to utilize an Attention-Mechanism on the raw inputs as explained very well in this paper.

The paper describes how to build a custom Attention layer that gives weights to the raw inputs before feeding to the LSTM. I would like to utilise the new keras Attention layer. After looking around on how to implement this I came up with the below model:

sequence_input = layers.Input(shape=(time_steps, features), dtype='float32')
lstm, state_h, state_c = layers.LSTM(units=50, 
                                     dropout=0.1,
                                     activation='tanh',
                                     return_state=True,
                                     kernel_regularizer=1e-6,
                                     recurrent_regularizer=1e-4,
                                     bias_regularizer=1e-6,
                                     return_sequences=True)(sequence_input)
context_vector, attention_weights = layers.Attention()([lstm, state_h])
output = Dense(units=1)(context_vector)
model = tf.keras.Model(inputs=sequence_input, outputs=output)

The above model is giving the below TypeError when compiling the Attention layer:

TypeError: Cannot iterate over a tensor with unknown first dimension.

This is because the first dimension of sequence_input, lstm and state_h is None. I am new to the Attention layer and I am pretty sure I am missing something. The keras documentation only has an example where an Embedding layer follows the Input layer which is not something I need for a forecasting regression model where each sample is a float.

PS: There might be other issues after the Attention layer. I have not yet been able to get past this issue first. Not claiming this implementation is correct.

Update

One issue is that I am passing a 2D tensor to Attention(). The hidden state has shape (None, 50). Shouldn't the LSTM have a hidden state for each feature, i.e. (None, 50, 10)?, Following this question, It seems like the hidden state should be passed as the value. So I am not sure why the dimensionality is not correct.

bcsta
  • 1,963
  • 3
  • 22
  • 61
  • The Keras attention layer accepts 3D data, you are passing 3D and 2D... pay attention to also what are you trying to achieve – Marco Cerliani Sep 14 '20 at 09:32
  • @MarcoCerliani The returned hidden state of the lstm is (None, 50). Shouldn't there be a hidden state for each feature? – bcsta Sep 14 '20 at 09:52
  • I suggest you this https://stackoverflow.com/a/61775631/10375049 – Marco Cerliani Sep 14 '20 at 09:55
  • @MarcoCerliani the answer says "value is the hidden state [batch_dim, features] where we add a temporal dimension for matrix operation [batch_dim, 1, features]. however hidden state is returning [batch_dim, units] instead. – bcsta Sep 14 '20 at 10:04
  • yes, for this reason, we add a temporal dimension in order to have it in 3D – Marco Cerliani Sep 14 '20 at 10:06
  • @MarcoCerliani my point is hidden state is [batch_dim, units] not [batch_dim, features] as assumed in the answer. – bcsta Sep 14 '20 at 10:09
  • it's simply a terminology mismatch, units and features can be interpreted as the same thing, the focal point is another – Marco Cerliani Sep 14 '20 at 10:12
  • The error "Cannot iterate over a tensor with unknown first dimension." persists. Probably because the size of the first dimension of 'lstm' and 'state_h' is 'None' – bcsta Sep 14 '20 at 10:15
  • pay attention... Attention layer return one element not 2.... is context_vector = Attention()([...]) not context_vector, attention_weights = Attention()([...]) – Marco Cerliani Sep 14 '20 at 10:21
  • about your edited question: attention calculates a dot product, that's why the last dimension is reduced. – runDOSrun Sep 14 '20 at 11:08

0 Answers0