Questions tagged [multihead-attention]
18 questions
1
vote
1 answer
How to read a BERT attention weight matrix?
I have extracted from the last layer and the last attention head of my BERT model the attention score/weights matrix. However I am not too sure how to read them. The matrix is the following one. I tried to find some more information in the…

Chiara
- 372
- 5
- 17
1
vote
0 answers
How to access the value projection at MultiHeadAttention layer in Pytorch
I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the…

Angelo
- 575
- 3
- 18
1
vote
1 answer
Multi head Attention calculation
I create a model with a multi head attention layer,
import torch
import torch.nn as nn
query = torch.randn(2, 4)
key = torch.randn(2, 4)
value = torch.randn(2, 4)
model = nn.MultiheadAttention(4, 1, bias=False)
model(query, key, value)
I attempt…

apostofes
- 2,959
- 5
- 16
- 31
0
votes
0 answers
How to convert Tensorflow Multi-head attention to PyTorch?
I'm converting a Tensorflow transformer model to Pytorch equivalent.
In TF multi-head attention part of the code I have:
att = layers.MultiHeadAttention(num_heads=6, key_dim=4)
and the input shape is [None, 136, 4] where None is the batch size, 136…

ORC
- 3
- 1
0
votes
1 answer
Inputs and Outputs Mismatch of Multi-head Attention Module (Tensorflow VS PyTorch)
I am trying to convert my tensorflow model for layers.MultiHeadAttention module from tf.keras to nn.MultiheadAttention from torch.nn module. Below are the snippets.
Tensorflow Multi-head Attention
import numpy as np
import tensorflow as tf
from…
0
votes
0 answers
ValueError: could not broadcast input array from shape (64,64) into shape (1,)
# Get the attention scores for the specific image and layer
attention_scores = attention_model.predict(normalized_image[np.newaxis, ...])
# Normalize the attention scores to [0, 1]
normalized_attention_scores = attention_scores /…
0
votes
0 answers
Pretrained CNN model training with Multi head attention
I trained an Efficienetb0 Model by adding two multi-head attention layers. But when I trainning the model I get the following warning.
Epoch: 1 | train_loss: 2.0100 | train_acc: 0.2708 | validation_loss: 1.7110 | validation_acc:…

Himali
- 11
- 2
0
votes
0 answers
How to insert a multi head attention layer into a pretrained EfficientnetB0 model using pytorch
I want to insert several multi-head attention layers into a Pretrained EfficientnetB0 model using pytorch. After each sequential block, I want to add a multi-head attention layer.
I tried to do this by …

Himali
- 11
- 2
0
votes
0 answers
Exception encountered when calling layer 'tft_multi_head_attention' (type TFTMultiHeadAttention)
I am trying to build a forecasting model with tft module with Temporal Fusion Transformer,I am getting below error when I am trying to train the model, since I am new to tensorflow, I can't understand fully what does it mean. I thought that simply…

Navneet
- 3
- 3
0
votes
0 answers
How do I get my transformer model to produce an output
I generated and trained a transformer model Using the following code
from tempfile import TemporaryDirectory
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import math
import pandas as pd
import…

Rome Drori
- 1
- 1
0
votes
0 answers
WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word
To calculate self-attention,
For each word, we create a Query vector, a Key vector, and a Value vector.
These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV…

Vinay Sharma
- 319
- 1
- 5
- 13
0
votes
0 answers
when I load saved model, KeyError: 'query_shape' are occured in keras
I followed this timeseries classification Transformer model example in keras.
When I trained, and valided in one file, It works well.
But, when I saved that model, and import model to test file using
loadModel =…

Yang
- 161
- 1
- 7
0
votes
0 answers
How to extract individual attention matrices from each head inside a MultiheadAttention module in a custom PyTorch Transformer model?
I have implemented a custom Transformer model using PyTorch. My model is primarily based on nn.TransformerEncoder and nn.TransformerEncoderLayer. Here is my code:
import torch.nn as nn
from torch import Tensor
import torch
import math
from torch.nn…

Monsieur AZERTY
- 103
- 6
0
votes
1 answer
Running speed of Pytorch MultiheadAttention compared to Torchvision MVit
I am currently experimenting with my model, which uses Torchvision implementation of MViT_v2_s as backbone. I added a few cross attention modules to the model which looks roughly like this:
class FusionModule(nn.Module):
def __init__(self,…

whz
- 11
- 3
0
votes
0 answers
Positional Embedding in Transformers - Time Series Data
I'm adding Multi-Headed attention at the input of my CNN to improve interpretability and explainability of my model. The data is formed as time-series 3D input of shape (125, 5, 6) where 5x6 part represents the data in a single sample and 125…

AMcoding
- 3
- 3