Keras Attention Layer Issue with LSTM regression

Question

I am using a keras LSTM for a regression model. I would like to utilize an Attention-Mechanism on the raw inputs as explained very well in this paper.

The paper describes how to build a custom Attention layer that gives weights to the raw inputs before feeding to the LSTM. I would like to utilise the new keras Attention layer. After looking around on how to implement this I came up with the below model:

sequence_input = layers.Input(shape=(time_steps, features), dtype='float32')
lstm, state_h, state_c = layers.LSTM(units=50, 
                                     dropout=0.1,
                                     activation='tanh',
                                     return_state=True,
                                     kernel_regularizer=1e-6,
                                     recurrent_regularizer=1e-4,
                                     bias_regularizer=1e-6,
                                     return_sequences=True)(sequence_input)
context_vector, attention_weights = layers.Attention()([lstm, state_h])
output = Dense(units=1)(context_vector)
model = tf.keras.Model(inputs=sequence_input, outputs=output)

The above model is giving the below TypeError when compiling the Attention layer:

TypeError: Cannot iterate over a tensor with unknown first dimension.

This is because the first dimension of sequence_input, lstm and state_h is None. I am new to the Attention layer and I am pretty sure I am missing something. The keras documentation only has an example where an Embedding layer follows the Input layer which is not something I need for a forecasting regression model where each sample is a float.

PS: There might be other issues after the Attention layer. I have not yet been able to get past this issue first. Not claiming this implementation is correct.

Update

One issue is that I am passing a 2D tensor to Attention(). The hidden state has shape (None, 50). Shouldn't the LSTM have a hidden state for each feature, i.e. (None, 50, 10)?, Following this question, It seems like the hidden state should be passed as the value. So I am not sure why the dimensionality is not correct.

The Keras attention layer accepts 3D data, you are passing 3D and 2D... pay attention to also what are you trying to achieve — Marco Cerliani, Sep 14 '20 at 09:32
@MarcoCerliani The returned hidden state of the lstm is (None, 50). Shouldn't there be a hidden state for each feature? — bcsta, Sep 14 '20 at 09:52
I suggest you this https://stackoverflow.com/a/61775631/10375049 — Marco Cerliani, Sep 14 '20 at 09:55
@MarcoCerliani the answer says "value is the hidden state [batch_dim, features] where we add a temporal dimension for matrix operation [batch_dim, 1, features]. however hidden state is returning [batch_dim, units] instead. — bcsta, Sep 14 '20 at 10:04
yes, for this reason, we add a temporal dimension in order to have it in 3D — Marco Cerliani, Sep 14 '20 at 10:06
@MarcoCerliani my point is hidden state is [batch_dim, units] not [batch_dim, features] as assumed in the answer. — bcsta, Sep 14 '20 at 10:09
it's simply a terminology mismatch, units and features can be interpreted as the same thing, the focal point is another — Marco Cerliani, Sep 14 '20 at 10:12
The error "Cannot iterate over a tensor with unknown first dimension." persists. Probably because the size of the first dimension of 'lstm' and 'state_h' is 'None' — bcsta, Sep 14 '20 at 10:15
pay attention... Attention layer return one element not 2.... is context_vector = Attention()([...]) not context_vector, attention_weights = Attention()([...]) — Marco Cerliani, Sep 14 '20 at 10:21
about your edited question: attention calculates a dot product, that's why the last dimension is reduced. — runDOSrun, Sep 14 '20 at 11:08

Keras Attention Layer Issue with LSTM regression

Update

0 Answers0