WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

Asked May 31 '23 at 04:05

Active May 31 '23 at 04:05

Viewed 64 times

To calculate self-attention, For each word, we create a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV matrix.

Question: are these matrices WQ, WK, WV same for every input word (embedding) or they are different for different different words?

Paper link

asked May 31 '23 at 04:05

Vinay Sharma

See http://nlp.seas.harvard.edu/annotated-transformer/ – alvas May 31 '23 at 04:31

WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

0 Answers0