Tensorflow Multi Head Attention on Inputs: 4 x 5 x 20 x 64 with attention_axes=2 throwing mask dimension error (tf 2.11.0)

Question

The expectation here is that the attention is applied on the 2nd dimension (4, 5, 20, 64). I am trying to apply self attention using the following code (issue reproducible with this code):

import numpy as np
import tensorflow as tf
from keras import layers as tfl

class Encoder(tfl.Layer):
    def __init__(self,):
        super().__init__()
        self.embed_layer = tfl.Embedding(4500, 64, mask_zero=True)
        self.attn_layer = tfl.MultiHeadAttention(num_heads=2,
                                                 attention_axes=2,
                                                 key_dim=16)
        return

    def call(self, x):
        # Input shape: (4, 5, 20) (Batch size: 4)
        x = self.embed_layer(x)  # Output: (4, 5, 20, 64)
        x = self.attn_layer(query=x, key=x, value=x)  # Output: (4, 5, 20, 64)
        return x


eg_input = tf.constant(np.random.randint(0, 150, (4, 5, 20)))
enc = Encoder()
enc(eg_input)

However, the above layer defined throws the following error. Could someone please explain why is this happening & how to fix this?

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [4,5,2,20,20] vs. [4,5,1,5,20] [Op:AddV2]

Call arguments received by layer 'softmax_2' (type Softmax):
  • inputs=tf.Tensor(shape=(4, 5, 2, 20, 20), dtype=float32)
  • mask=tf.Tensor(shape=(4, 5, 1, 5, 20), dtype=bool)

PS: If I set mask_zero = False in defining the embedding layer, the code runs fine as expected without any issues.

Thanks V.M. I wanted to apply attention on 2nd dimension (on the 20 x 32 matrix specifically), merging the first & second dimension would lead to attention being applied on both the dimension together. — Vidyadhar Mudium, Nov 29 '22 at 07:10

score 1 · Accepted Answer · answered Nov 29 '22 at 09:19

Just concat the input along axis=0

import numpy as np
import tensorflow as tf
from keras import layers as tfl

class Encoder(tfl.Layer):
    def __init__(self,):
        super().__init__()
        self.embed_layer = tfl.Embedding(4500, 64, mask_zero=True)
        self.attn_layer = tfl.MultiHeadAttention(num_heads=2,
                                                 key_dim=16,
                                                 attention_axes=2)

    def call(self, x):
        x = self.embed_layer(x)  # Output: (4, 5, 20, 32)
        x = tf.concat(x, axis=0)
        x, attention_scores = self.attn_layer(query=x, key=x, value=x , return_attention_scores=True)  # Output: (4, 5, 20, 32)
        return x , attention_scores


eg_input = tf.constant(np.random.randint(0, 150, (4, 5, 20)))
enc = Encoder()
scores , attentions = enc(eg_input)

scores.shape , attentions.shape
#(TensorShape([4, 5, 20, 64]), TensorShape([4, 5, 2, 20, 20]))

Thanks Ahmed!! That worked, could you please explain what was the issue earlier? — Vidyadhar Mudium, Nov 29 '22 at 09:31
The issue was the built-in function of the Tensorflow model makes a mask when the input is sent to the self-attention but in your case, the problem was the dimensions of `attention-scores` and `mask` were not matched. So, it is a little trick to make the correct masks. If you see the inputs before concat and after concat nothing is changed, the inputs are all the same in dims, in everything but this is a trick. — Mohammad Ahmed, Nov 29 '22 at 09:38

Tensorflow Multi Head Attention on Inputs: 4 x 5 x 20 x 64 with attention_axes=2 throwing mask dimension error (tf 2.11.0)

1 Answers1