I am using a keras
LSTM
for a regression model. I would like to utilize an Attention-Mechanism on the raw inputs as explained very well in this paper.
The paper describes how to build a custom Attention layer that gives weights to the raw inputs before feeding to the LSTM
. I would like to utilise the new keras Attention
layer. After looking around on how to implement this I came up with the below model:
sequence_input = layers.Input(shape=(time_steps, features), dtype='float32')
lstm, state_h, state_c = layers.LSTM(units=50,
dropout=0.1,
activation='tanh',
return_state=True,
kernel_regularizer=1e-6,
recurrent_regularizer=1e-4,
bias_regularizer=1e-6,
return_sequences=True)(sequence_input)
context_vector, attention_weights = layers.Attention()([lstm, state_h])
output = Dense(units=1)(context_vector)
model = tf.keras.Model(inputs=sequence_input, outputs=output)
The above model is giving the below TypeError
when compiling the Attention
layer:
TypeError: Cannot iterate over a tensor with unknown first dimension.
This is because the first dimension of sequence_input
, lstm
and state_h
is None
. I am new to the Attention
layer and I am pretty sure I am missing something. The keras documentation only has an example where an Embedding
layer follows the Input
layer which is not something I need for a forecasting regression model where each sample is a float
.
PS: There might be other issues after the Attention layer. I have not yet been able to get past this issue first. Not claiming this implementation is correct.
Update
One issue is that I am passing a 2D tensor to Attention(). The hidden state has shape (None, 50). Shouldn't the LSTM have a hidden state for each feature, i.e. (None, 50, 10)?, Following this question, It seems like the hidden state should be passed as the value. So I am not sure why the dimensionality is not correct.