0

In building a model that uses TensorFlow 2.0 Attention I followed the example given in the TF docs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention

The last line in the example is

input_layer = tf.keras.layers.Concatenate()(
    [query_encoding, query_value_attention])

Then the example has the comment

# Add DNN layers, and create Model.
# ...

So it seemed logical to do this

model = tf.keras.Sequential()
model.add(input_layer)

This produces the error

TypeError: The added layer must be an instance of class Layer.
Found: Tensor("concatenate/Identity:0", shape=(None, 200), dtype=float32)

UPDATE (after @thushv89 response)

What I am trying to do in the end is add an attention layer in the following model which works well (or convert it to an attention model).

model = tf.keras.Sequential()
model.add(layers.Embedding(vocab_size, embedding_nodes, input_length=max_length))
model.add(layers.LSTM(20))
#add attention here?
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error', metrics=['accuracy'])

My data looks like this

4912,5059,5079,0
4663,5145,5146,0
4663,5145,5146,0
4840,5117,5040,0

Where the first three columns are the inputs and the last column is binary and the goal is classification. The data was prepared similarly to this example with a similar purpose, binary classification. https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/

greco.roamin
  • 799
  • 1
  • 6
  • 20

1 Answers1

2

So, first thing is Keras has three APIs when it comes to creating models.

  • Sequential - (Which is what you're doing here)
  • Functional - (Which is what I'm using in the solution)
  • Subclassing - Creating Python classes to represent custom models/layers

The way the model created in the tutorial is not to be used with sequential models but a model from the Functional API. So you got to do the following. Note that, I've taken the liberty of defining the dense layers with arbitrary parameters (e.g. number of output classes, which you can change as needed).

import tensorflow as tf

# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')

# ... the code in the middle

# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
    [query_encoding, query_value_attention])

# Add DNN layers, and create Model.
# ...
dense_out = tf.keras.layers.Dense(50, activation='relu')(input_layer)
pred = tf.keras.layers.Dense(10, activation='softmax')(dense_out)

model = tf.keras.models.Model(inputs=[query_input, value_input], outputs=pred)
model.summary()
thushv89
  • 10,865
  • 1
  • 26
  • 39
  • Thanks, that was helpful but adds another issue. I get the model to compile and the summary is close to what I'm looking for, but why two input arrays and what are they? I amended my question based on your answer. I'm doing something similar to the following link, and I only have one input array. https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/ – greco.roamin Nov 24 '19 at 02:01
  • This is not a typical attention layer. But it is designed to be used as the attention module in a Transformer model which is quite different. What I'm not sure if this attention can be used to implement type of attention in a NLP model. – thushv89 Nov 24 '19 at 02:56
  • @greco.roamin actually, it should be able to be used as a typical attention layer. I'll see into it update my answer accordingly. – thushv89 Nov 24 '19 at 03:02
  • Thank you. Very much appreciate the advice. If it helps, this is also my question on the same problem, but before I realized how much more involved an Attention layer needs to be. However, it might give more context. https://stackoverflow.com/questions/58966874/adding-attention-on-top-of-simple-lstm-layer-in-tensorflow-2-0/ Also this post is also doing something similar. The point is to implement Attention for classification, not for encoding/decoding https://www.depends-on-the-definition.com/attention-lstm-relation-classification/ – greco.roamin Nov 24 '19 at 03:30
  • 1
    @greco.roamin so after some research, I don't think you can use this attention layer to solve your problem. This attention is for encoder-decoder models. But having said that, you should be able to implement an "attention" layer for this problem (although it is not common to do this for non-encoder-decoder models - what I've seen). Let me know if you want to pursue down this path. – thushv89 Nov 24 '19 at 03:59
  • I have implemented an attention layer from this post https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/ and it seems to work. The accuracy of the model does not improve but it trains several orders-of-magnitude faster than any equivalent model without attention. I would appreciate any comment or suggestion on the approach, or any other solutions you'd like to offer. – greco.roamin Nov 24 '19 at 15:45
  • @greco.roamin do you have an implementation for that? What you came up with so far – thushv89 Nov 25 '19 at 00:05
  • I used the attention function in the above mentioned post as-is. My implementation is straight-forward, I just built a sequential model and added the attention layer. It seems to work. Here is the code, with the \ denoting a new line. model = tf.keras.Sequential() \ model.add(layers.Embedding(vocab_size, embedding_nodes, input_length=max_length)) \ model.add(layers.LSTM(20, return_sequences=True)) \ model.add(attention()) \ model.add(layers.Dense(1, activation='sigmoid')) \ model.compile(loss='mean_squared_error', metrics=['accuracy']) – greco.roamin Nov 25 '19 at 21:32
  • @greco.roamin so is it all working now? Or do you still have issues with the training? – thushv89 Nov 26 '19 at 09:04
  • 1
    It's working as far as I can tell. I'm getting some interesting results and it trains fast and well. This is pure experimentation to see the effect of attention on data for which is was not specifically designed. What I need to do now is remove the entire concept of an embedded layer for a generalized solution for attention. I don't expect to solve that here and now, but I appreciate your insight and help on this step in the process. Thank you. – greco.roamin Nov 26 '19 at 15:37
  • Great to hear that. Good luck! :) – thushv89 Nov 26 '19 at 19:40