Highest Voted 'self-attention' Questions

3

votes

0 answers

How is scaled_dot_product_attention meant to be used with cached keys/values in causal LM?

I'm implementing a transformer and I have everything working, including attention using the new scaled_dot_product_attention from PyTorch 2.0. I'll only be doing causal attention, however, so it seems like it makes sense to use the is_causal=True…

pytorch language-model self-attention

asked May 04 '23 at 20:39

turboderp

31
2

3

votes

0 answers

Implementing 1D self attention in PyTorch

I'm trying to implement the 1D self-attention block below using PyTorch: proposed in the following paper. Below you can find my (provisional) attempt: import torch.nn as nn import torch #INPUT shape ((B), CH, H, W) class…

pytorch attention-model self-attention

asked Mar 21 '22 at 16:07

James Arten

523
5
16

2

votes

1 answer

Have I implemented self-attention correctly in Pytorch?

This is my attempt at implementing self-attention using PyTorch. Have I done anything wrong, or could it be improved somehow? class SelfAttention(nn.Module): def __init__(self, embedding_dim): super(SelfAttention, self).__init__() …

pytorch nlp attention-model self-attention

asked Dec 25 '22 at 12:32

Henry Gordon

21
2

2

votes

0 answers

MultiHeadAttention masking with tensorflow

I have been trying to make a custom mask for targetted combinations of queries and keys for my MultiHeadAttention layer but can not figure out the way to use this layer masking. Here is an example with a dummy dataset (batch size 1) : key =…

tensorflow masking transformer-model self-attention

asked May 26 '22 at 10:15

Etienne Salimbeni

506
5
7

2

votes

1 answer

Keras MultiHeadAttention layer throwing IndexError: tuple index out of range

I'm getting this error over and over again when trying to do self attention on 1D vectors, I don't really understand why that happens, any help would be greatly appreciated. layer = layers.MultiHeadAttention(num_heads=2, key_dim=2) target =…

python tensorflow keras attention-model self-attention

asked Jan 25 '22 at 14:34

Fourat Thamri

73
6

2

votes

1 answer

For an image or sequence, what is the properties transformers use?

Today my teacher ask me a question: he said the CNN is use the translation invariance of the images or matrixs. So what is the properties Transformer uses ???

conv-neural-network transformer-model self-attention

asked Jan 05 '22 at 07:54

qicheng wang

43
3

2

votes

1 answer

How to implement hierarchical Transformer for document classification in Keras?

Hierarchical attention mechanism for document classification has been presented by Yang et al. https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf Its implementation is available on…

keras deep-learning nlp transformer-model self-attention

asked Dec 08 '21 at 08:13

Rahman

410
6
26

1

vote

0 answers

Customize the attention mechanism but report an error, tensorflow: Gradients do not exist for variables

Sorry, I just started learning to build neural networks using tensorflow. The tensorflow version I am using is 2.3. I want to use a custom attention layer to associate the output of the encoding layer with another input, this is my main…

python tensorflow gradient self-attention

asked Jul 31 '23 at 14:46

Ike

11
1

1

vote

1 answer

tensorflow 2.10 vs 2.12, same training script, same data, significantly worse training for 2.12

I use this code https://www.kaggle.com/code/ritvik1909/masked-autoencoder-vision-transformer to train a network a transformer autoencoder. If I use the code under tensorflow 2.10, I obtain way better results than if I use 2.12. I don't change the…

python tensorflow tensorflow2.0 updates self-attention

asked Jul 29 '23 at 10:08

PMDP3

35
7

1

vote

1 answer

How to read a BERT attention weight matrix?

I have extracted from the last layer and the last attention head of my BERT model the attention score/weights matrix. However I am not too sure how to read them. The matrix is the following one. I tried to find some more information in the…

huggingface-transformers bert-language-model attention-model self-attention multihead-attention

asked Mar 17 '23 at 21:15

Chiara

372
5
17

1

vote

0 answers

How to access the value projection at MultiHeadAttention layer in Pytorch

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the…

python pytorch transformer-model self-attention multihead-attention

asked Feb 08 '23 at 20:10

Angelo

575
3
18

1

vote

0 answers

Receptive Field in Swin Transformer

I want to ask if is it true that the receptive field of swin transformer is just in the local window where we compute the self-attention? And is there any way to increase the receptive field when using swin transformer? I know that when we use…

conv-neural-network transformer-model attention-model self-attention receptive-field

asked Dec 12 '22 at 08:52

killermama98

45
5

1

vote

1 answer

Tensorflow Multi Head Attention on Inputs: 4 x 5 x 20 x 64 with attention_axes=2 throwing mask dimension error (tf 2.11.0)

The expectation here is that the attention is applied on the 2nd dimension (4, 5, 20, 64). I am trying to apply self attention using the following code (issue reproducible with this code): import numpy as np import tensorflow as tf from keras import…

python python-3.x tensorflow attention-model self-attention

asked Nov 29 '22 at 06:47

Vidyadhar Mudium

77
1
5

1

vote

1 answer

How to understand the self-attention mask implementation in google transformer tutorial

I am reading google's transformer tutorial, and the part why the attention_mask for multi-head attention can be built via mask1 & mask2 was unclear to me. Any help would be great! def call(self, x, training, mask): # A boolean mask. if…

tensorflow transformer-model self-attention

asked Sep 24 '22 at 07:14

user1269298

717
2
8
26

1

vote

0 answers

How to extract graph node embeddings from a Pytorch-Geometric GAT model?

Dataset Strucute: Temporal directed graph; Nodes have features; Edges don't have features; Nodes are labelled. Using the Elliptic Dataset Task: Classify nodes/ Predict node labels. Data Structure: 2 .csv files of nodes and edges. For the nodes csv…

python deep-learning pytorch pytorch-geometric self-attention

asked Sep 20 '22 at 13:33

Fardin Ahsan

33
7

Questions tagged [self-attention]