Questions tagged [seq2seq]

Seq2Seq is a sequence to sequence learning add-on for the python deep learning library.

318 questions
14
votes
2 answers

Multilayer Seq2Seq model with LSTM in Keras

I was making a seq2seq model in keras. I had built single layer encoder and decoder and they were working fine. But now I want to extend it to multi layer encoder and decoder. I am building it using Keras Functional API. Training:- Code for…
SAGAR
  • 151
  • 1
  • 7
12
votes
1 answer

Prepare Decoder of a Sequence to Sequence Network in PyTorch

I was working with Sequence to Sequence models in Pytorch. Sequence to Sequence Models comprises of an Encoder and a Decoder. The Encoder convert a (batch_size X input_features X num_of_one_hot_encoded_classes) -> (batch_size X input_features X…
Shubhashis
  • 10,411
  • 11
  • 33
  • 48
11
votes
3 answers

Why do we do batch matrix-matrix product?

I'm following Pytorch seq2seq tutorial and ittorch.bmm method is used like below: attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0)) I understand why we need to multiply attention weight and…
aerin
  • 20,607
  • 28
  • 102
  • 140
9
votes
2 answers

Seq2Seq model learns to only output EOS token (<\s>) after a few iterations

I am creating a chatbot trained on Cornell Movie Dialogs Corpus using NMT. I am basing my code in part from https://github.com/bshao001/ChatLearner and https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot During…
noel
  • 99
  • 1
  • 5
6
votes
1 answer

Keras, model trains successfully but generating predictions gives ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor

I created a Seq2Seq model for text summarization. I have two models, one with attention and one without. The one without attention was able to generate predictions but I can't do it for the one with attention even though it fits successfully. This…
BlueMango
  • 463
  • 7
  • 21
6
votes
1 answer

seq2seq to predict next time step

I'm currently trying to predict the next sequence of goods a customer is likely to buy in the next time period. The following example is for illustrative purposes(my actual dataset has around 6 mil customer ids and 5000 different products) My…
M3105
  • 519
  • 2
  • 7
  • 20
6
votes
1 answer

Implementing Luong Attention in PyTorch

I am trying to implement the attention described in Luong et al. 2015 in PyTorch myself, but I couldn't get it work. Below is my code, I am only interested in the "general" attention case for now. I wonder if I am missing any obvious error. It runs,…
zyxue
  • 7,904
  • 5
  • 48
  • 74
5
votes
1 answer

Equivalent of tf.contrib.legacy_seq2seq.attention_decoder in tensorflow 2 after upgrade

I have the following code in TensorFlow 1.0. I tried to migrate it to TensorFlow 2.0 using tf_upgrade_v2 script. However, it didnt find an equivalent function in the tf-2 compact version. I was recommended to use tensorflow_addons. However, I dont…
5
votes
2 answers

PyTorch: Different Forward Methods for Train and Test/Validation

I'm currently trying to extend a model that is based on FairSeq/PyTorch. During training I need to train two encoders: one with the target sample, and the original one with the source sample. So the current forward function looks like this: def…
qwertz
  • 315
  • 1
  • 4
  • 14
5
votes
3 answers

embedding layer outputs nan

I am trying to learn a seq2seq model. An embedding layer is located in the encoder and it sometimes outputs nan value after some iterations. I cannot identify the reason. How can I solve this?? The problem is the first emb_layer in the forward…
kintsuba
  • 139
  • 2
  • 7
5
votes
1 answer

keras - seq2seq model predicting same output for all test inputs

I am trying to build a seq2seq model using LSTM in Keras. Currently working on the English to French pairs dataset-10k pairs(orig dataset has 147k pairs). After training is completed while trying to predict the output for the given input sequence…
Sunil
  • 141
  • 1
  • 9
5
votes
0 answers

Graph building fails at tf.scatter_nd due to placeholder shape limitations

Using scatter_nd to project an attention distribution onto another distribution, essentially creating an distribution that references a vocabulary. indices = tf.stack((batch_nums, encoder_batch), axis=2) shape = [batch_size,…
Arya Vohra
  • 71
  • 1
5
votes
1 answer

tensorflow code TypeError: unsupported operand type(s) for *: 'int' and 'Flag'

BATCH_QUEUE_MAX = 100 self._data_path = data_path self._vocab = vocab self._hps = hps self._single_pass = single_pass # Initialize a queue of Batches waiting to be used, and a queue of Examples waiting to be batched self._batch_queue =…
dongmei
  • 89
  • 1
  • 6
4
votes
1 answer

Restrict Vocab for BERT Encoder-Decoder Text Generation

Is there any way to restrict the vocabulary of the decoder in a Huggingface BERT encoder-decoder model? I'd like to force the decoder to choose from a small vocabulary when generating text rather than BERT's entire ~30k vocabulary.
4
votes
1 answer

Where to find a Seq2SeqTrainer to import into project?

Like the title says, I require a Seq2SeqTrainer for my project, but the file/s on Github are not available and return a 404. I use this code to try and import it: !wget…
BzeQ
  • 93
  • 1
  • 11
1
2 3
21 22