0

I want to do a task oriented dialog chatbot which is used to book restaurant.Because every dialog has different sequences(eg. some has 5 turns of dialogs which is 10 sentences while another may has 6 turns of dialogs which is 12 sentences totally),I don't know how to batch dataset.

Could you give me some tutorial or github example?

andy
  • 1,951
  • 5
  • 16
  • 30

1 Answers1

1

There are some related questions to this on Stackoverflow. I liked the explanation/answer provided here. The tldr version is to use Packed Sequence. The answer I linked to provides the following example (copied from the link):

a = [torch.tensor([1,2,3]), torch.tensor([3,4])]
b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True)
>>>>
tensor([[ 1,  2,  3],
    [ 3,  4,  0]])
torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2])
>>>>PackedSequence(data=tensor([ 1,  3,  2,  4,  3]), batch_sizes=tensor([ 2,  2,  1]))
Shagun Sodhani
  • 3,535
  • 4
  • 30
  • 41