Attention weighted aggregation

Question

Let the tensor shown below be the representation of two sentences (batch_size = 2) composed with 3 words (max_lenght = 3) and each word being represented by vectors of dimension equal to 5 (hidden_size = 5) obtained as output from a neural network:

net_output
# tensor([[[0.7718, 0.3856, 0.2545, 0.7502, 0.5844],
#          [0.4400, 0.3753, 0.4840, 0.2483, 0.4751],
#          [0.4927, 0.7380, 0.1502, 0.5222, 0.0093]],

#         [[0.5859, 0.0010, 0.2261, 0.6318, 0.5636],
#          [0.0996, 0.2178, 0.9003, 0.4708, 0.7501],
#          [0.4244, 0.7947, 0.5711, 0.0720, 0.1106]]])

Also consider the following attention scores:

att_scores
# tensor([[0.2425, 0.5279, 0.2295],
#         [0.2461, 0.4789, 0.2751]])

Which efficient approach allows obtaining the aggregation of vectors in net_output weighted by att_scores resulting in a vector of shape (2, 5)?

swag2198 · Accepted Answer · 2021-05-04T17:22:33.110

1

This should work:

weighted = (net_output * att_scores[..., None]).sum(axis = 1)

Uses broadcasting to (elementwise) multiply the attention weights to each vector and aggregates (them by summing) all vectors in a batch.

edited May 04 '21 at 17:22

answered May 04 '21 at 17:16

swag2198

2,546
1
7
18

It worked. Could you give me some directions to what the `att_scores[..., None])` means? Is it another approach to reshape? – Celso França May 04 '21 at 17:35
1

`None` simply adds a new dimension to the tensor. It makes the `att_scores` of shape (2, 3, 1) so that it becomes compatible for [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) with the output tensor of shape (2, 3, 5). – swag2198 May 04 '21 at 17:38
1

Great, thank you. It is like `att_scores.unsqueeze(-1)`. – Celso França May 04 '21 at 17:40
You can also look up [this](https://stackoverflow.com/questions/29241056/how-does-numpy-newaxis-work-and-when-to-use-it) answer for more explanation on numpy newaxis. – swag2198 May 04 '21 at 17:42
1

Yeah exactly like the unsqueeze operation! – swag2198 May 04 '21 at 17:43

Attention weighted aggregation

1 Answers1