Highest Voted 'attention-model' Questions

36

votes

5 answers

What is the difference between Luong attention and Bahdanau attention?

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

asked May 29 '17 at 08:43

Shamane Siriwardhana

3,951
6
33
73

29

votes

3 answers

How to understand masked multi-head attention in transformer

I'm currently studying code of transformer, but I can not understand the masked multi-head of decoder. The paper said that it is to prevent you from seeing the generating word, but I can not unserstand if the words after generating word have not…

tensorflow deep-learning transformer-model attention-model

asked Sep 27 '19 at 02:40

Neptuner

291
1
3
3

20

votes

2 answers

what the difference between att_mask and key_padding_mask in MultiHeadAttnetion

What the difference between att_mask and key_padding_mask in MultiHeadAttnetion of pytorch: key_padding_mask – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the…

python deep-learning pytorch transformer-model attention-model

asked Jun 29 '20 at 00:31

one

2,205
1
15
37

17

votes

1 answer

Adding Attention on top of simple LSTM layer in Tensorflow 2.0

I have a simple network of one LSTM and two Dense layers as such: model = tf.keras.Sequential() model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(layers.Dense(20, activation='sigmoid')) model.add(layers.Dense(1,…

python tensorflow keras lstm attention-model

asked Nov 21 '19 at 03:32

greco.roamin

799
1
6
20

17

votes

2 answers

Does attention make sense for Autoencoders?

I am struggling with the concept of attention in the the context of autoencoders. I believe I understand the usage of attention with regards to seq2seq translation - after training the combined encoder and decoder, we can use both encoder and…

lstm recurrent-neural-network autoencoder dimensionality-reduction attention-model

asked Sep 28 '19 at 10:49

user3641187

405
5
10

16

votes

3 answers

How to build a attention model with keras?

I am trying to understand attention model and also build one myself. After many searches I came across this website which had an atteniton model coded in keras and also looks simple. But when I tried to build that same model in my machine its giving…

python tensorflow keras deep-learning attention-model

asked Jul 09 '19 at 07:03

Eka

14,170
38
128
212

16

votes

5 answers

RuntimeError: "exp" not implemented for 'torch.LongTensor'

I am following this tutorial: http://nlp.seas.harvard.edu/2018/04/03/attention.html to implement the Transformer model from the "Attention Is All You Need" paper. However I am getting the following error : RuntimeError: "exp" not implemented for…

pytorch tensor attention-model

asked Oct 22 '18 at 04:32

noob

5,954
6
20
32

16

votes

2 answers

Attention Layer throwing TypeError: Permute layer does not support masking in Keras

I have been following this post in order to implement attention layer over my LSTM model. Code for the attention layer: INPUT_DIM = 2 TIME_STEPS = 20 SINGLE_ATTENTION_VECTOR = False APPLY_ATTENTION_BEFORE_LSTM = False def…

python tensorflow keras lstm attention-model

asked Aug 15 '17 at 11:09

Saurav--

1,530
2
15
33

14

votes

1 answer

How visualize attention LSTM using keras-self-attention package?

I'm using (keras-self-attention) to implement attention LSTM in KERAS. How can I visualize the attention part after training the model? This is a time series forecasting case. from keras.models import Sequential from keras_self_attention import…

python tensorflow keras lstm attention-model

asked Oct 12 '19 at 17:47

Eghbal

3,892
13
51
112

14

votes

2 answers

Why does embedding vector multiplied by a constant in Transformer model?

I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding. As section Positional encoding says: Since this model doesn't contain any recurrence or…

python tensorflow deep-learning attention-model

asked Jul 08 '19 at 08:12

giser_yugang

6,058
4
21
44

13

votes

2 answers

Keras - Add attention mechanism to an LSTM model

With the following code: model = Sequential() num_features = data.shape[2] num_samples = data.shape[1] model.add( LSTM(16, batch_input_shape=(None, num_samples, num_features), return_sequences=True,…

python machine-learning keras lstm attention-model

asked Nov 05 '18 at 09:03

Shlomi Schwartz

8,693
29
109
186

12

votes

2 answers

Why embed dimemsion must be divisible by num of heads in MultiheadAttention?

I am learning the Transformer. Here is the pytorch document for MultiheadAttention. In their implementation, I saw there is a constraint: assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads" Why require…

python-3.x pytorch transformer-model attention-model

asked Feb 26 '21 at 16:45

jason

1,998
3
22
42

12

votes

2 answers

Should RNN attention weights over variable length sequences be re-normalized to "mask" the effects of zero-padding?

To be clear, I am referring to "self-attention" of the type described in Hierarchical Attention Networks for Document Classification and implemented many places, for example: here. I am not referring to the seq2seq type of attention used in…

tensorflow machine-learning deep-learning recurrent-neural-network attention-model

asked Mar 27 '18 at 21:27

t-flow

123
8

12

votes

1 answer

Visualizing attention activation in Tensorflow

Is there a way to visualize the attention weights on some input like the figure in the link above(from Bahdanau et al., 2014), in TensorFlow's seq2seq models? I have found TensorFlow's github issue regarding this, but I couldn't find out how to…

tensorflow deep-learning attention-model sequence-to-sequence

asked Nov 15 '16 at 03:34

reiste

123
1
5

11

votes

2 answers

How can LSTM attention have variable length input

The attention mechanism of LSTM is a straight softmax feed forward network that takes in the hidden states of each time step of the encoder and the decoder's current state. These 2 steps seems to contradict and can't wrap my head around: 1) The…

machine-learning neural-network lstm recurrent-neural-network attention-model

asked Jun 08 '17 at 18:48

Andrew Tu

258
3
8

Questions tagged [attention-model]