Seq2Seq is a sequence to sequence learning add-on for the python deep learning library.
Questions tagged [seq2seq]
318 questions
14
votes
2 answers
Multilayer Seq2Seq model with LSTM in Keras
I was making a seq2seq model in keras. I had built single layer encoder and decoder and they were working fine. But now I want to extend it to multi layer encoder and decoder.
I am building it using Keras Functional API.
Training:-
Code for…

SAGAR
- 151
- 1
- 7
12
votes
1 answer
Prepare Decoder of a Sequence to Sequence Network in PyTorch
I was working with Sequence to Sequence models in Pytorch. Sequence to Sequence Models comprises of an Encoder and a Decoder.
The Encoder convert a (batch_size X input_features X num_of_one_hot_encoded_classes) -> (batch_size X input_features X…

Shubhashis
- 10,411
- 11
- 33
- 48
11
votes
3 answers
Why do we do batch matrix-matrix product?
I'm following Pytorch seq2seq tutorial and ittorch.bmm method is used like below:
attn_applied = torch.bmm(attn_weights.unsqueeze(0),
encoder_outputs.unsqueeze(0))
I understand why we need to multiply attention weight and…

aerin
- 20,607
- 28
- 102
- 140
9
votes
2 answers
Seq2Seq model learns to only output EOS token (<\s>) after a few iterations
I am creating a chatbot trained on Cornell Movie Dialogs Corpus using NMT.
I am basing my code in part from https://github.com/bshao001/ChatLearner and https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot
During…

noel
- 99
- 1
- 5
6
votes
1 answer
Keras, model trains successfully but generating predictions gives ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor
I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one with attention even though it fits successfully.
This…

BlueMango
- 463
- 7
- 21
6
votes
1 answer
seq2seq to predict next time step
I'm currently trying to predict the next sequence of goods a customer is likely to buy in the next time period. The following example is for illustrative purposes(my actual dataset has around 6 mil customer ids and 5000 different products)
My…

M3105
- 519
- 2
- 7
- 20
6
votes
1 answer
Implementing Luong Attention in PyTorch
I am trying to implement the attention described in Luong et al. 2015 in PyTorch myself, but I couldn't get it work. Below is my code, I am only interested in the "general" attention case for now. I wonder if I am missing any obvious error. It runs,…

zyxue
- 7,904
- 5
- 48
- 74
5
votes
1 answer
Equivalent of tf.contrib.legacy_seq2seq.attention_decoder in tensorflow 2 after upgrade
I have the following code in TensorFlow 1.0. I tried to migrate it to TensorFlow 2.0 using tf_upgrade_v2 script. However, it didnt find an equivalent function in the tf-2 compact version.
I was recommended to use tensorflow_addons. However, I dont…

skwolvie
- 139
- 12
5
votes
2 answers
PyTorch: Different Forward Methods for Train and Test/Validation
I'm currently trying to extend a model that is based on FairSeq/PyTorch. During training I need to train two encoders: one with the target sample, and the original one with the source sample.
So the current forward function looks like this:
def…

qwertz
- 315
- 1
- 4
- 14
5
votes
3 answers
embedding layer outputs nan
I am trying to learn a seq2seq model.
An embedding layer is located in the encoder and it sometimes outputs nan value after some iterations.
I cannot identify the reason.
How can I solve this??
The problem is the first emb_layer in the forward…

kintsuba
- 139
- 2
- 7
5
votes
1 answer
keras - seq2seq model predicting same output for all test inputs
I am trying to build a seq2seq model using LSTM in Keras. Currently working on the English to French pairs dataset-10k pairs(orig dataset has 147k pairs). After training is completed while trying to predict the output for the given input sequence…

Sunil
- 141
- 1
- 9
5
votes
0 answers
Graph building fails at tf.scatter_nd due to placeholder shape limitations
Using scatter_nd to project an attention distribution onto another distribution, essentially creating an distribution that references a vocabulary.
indices = tf.stack((batch_nums, encoder_batch), axis=2)
shape = [batch_size,…

Arya Vohra
- 71
- 1
5
votes
1 answer
tensorflow code TypeError: unsupported operand type(s) for *: 'int' and 'Flag'
BATCH_QUEUE_MAX = 100
self._data_path = data_path
self._vocab = vocab
self._hps = hps
self._single_pass = single_pass
# Initialize a queue of Batches waiting to be used, and a queue of Examples waiting to be batched
self._batch_queue =…

dongmei
- 89
- 1
- 6
4
votes
1 answer
Restrict Vocab for BERT Encoder-Decoder Text Generation
Is there any way to restrict the vocabulary of the decoder in a Huggingface BERT encoder-decoder model? I'd like to force the decoder to choose from a small vocabulary when generating text rather than BERT's entire ~30k vocabulary.

Joseph Harvey
- 83
- 1
- 5
4
votes
1 answer
Where to find a Seq2SeqTrainer to import into project?
Like the title says, I require a Seq2SeqTrainer for my project, but the file/s on Github are not available and return a 404. I use this code to try and import it:
!wget…

BzeQ
- 93
- 1
- 11