1

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism I am adding that bias by hand and doing the matrix multiplication for the data with the attention weights outside the attention mechanism:

import torch as th
from torch import nn


# Variable inicialization
B, T, C, H = 2, 3, 4, 2
self_attn = nn.MultiheadAttention(C, H, batch_first = True)

# Tensors
x = th.randn(B, T, C)
attn_bias = th.ones((B, T, T))

#  Self-attention mechanism
_, attn_wei = self_attn(query=x, key=x, value=x)

# Adding attention bias
if attn_bias is not None:
    attn_wei = attn_wei + attn_bias

x = attn_wei @ x # TODO use value(x) instead of x

print(x)

This works, but for using the full potential of self-attention, the last matrix multiplication should be like x = attn_wei @ value(x) but I am not able to get the value projector from the selt_attn object as it should have something like that inside of it.

How could I do this?

Angelo
  • 575
  • 3
  • 18

0 Answers0