12

I was working with Sequence to Sequence models in Pytorch. Sequence to Sequence Models comprises of an Encoder and a Decoder.

The Encoder convert a (batch_size X input_features X num_of_one_hot_encoded_classes) -> (batch_size X input_features X hidden_size)

The Decoder will take this input sequence and convert it into (batch_size X output_features X num_of_one_hot_encoded_classes)

An example would be like-

enter image description here

So on the above example, I would need to convert the 22 input features to 10 output features. In Keras it could be done with a RepeatVector(10).

An Example -

model.add(LSTM(256, input_shape=(22, 98)))
model.add(RepeatVector(10))
model.add(Dropout(0.3))
model.add(LSTM(256, return_sequences=True))

Although, I'm not sure if it's the proper way to convert the input sequences into the output ones.

So, my question is -

  • What's the standard way to convert the input sequences to output ones. eg. converting from (batch_size, 22, 98) -> (batch_size, 10, 98)? Or how should I prepare the Decoder?

Encoder Code snippet (Written in Pytorch) -

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
          num_layers=1, batch_first=True)

    def forward(self, input):
        output, hidden = self.lstm(input)
        return output, hidden
Shubhashis
  • 10,411
  • 11
  • 33
  • 48
  • In your example `input_features` correspond to the "sequence length" dimension. Why would you want to specify the output sequence length beforehand, instead of letting the decoder naturally predict an "end-of-sequence" token? – jabalazs Nov 13 '18 at 11:44

1 Answers1

0

Well, you have to options, first one is to repeat the encoder's last state for 10 times and give it as input to the decoder, like this:

import torch
input = torch.randn(64, 22, 98)
encoder = torch.nn.LSTM(98, 256, batch_first=True)
encoded, _ = encoder(input)
decoder_input = encoded[:, -1:].repeat(1, 10, 1)
decoder = torch.nn.LSTM(256, 98, batch_first=True)
decoded, _ = decoder(decoder_input)
print(decoded.shape) #torch.Size([64, 10, 98])

Another option is to use an attention mechanism, like this:

#assuming we have obtained the encoded sequence and declared the decoder as before
attention_calculator = torch.nn.Conv1d(256+98, 1, kernel_size=1)
hidden = (torch.zeros(1, 64, 98), torch.zeros(1, 64, 98))
outputs = []
for i in range(10):
    attention_input = torch.cat([hidden[0][0][:, None, :].expand(-1, 22, -1), encoded], dim=2).permute(0, 2, 1)
    attention_value = torch.nn.functional.softmax(attention_calculator(attention_input).squeeze(), dim=1)
    decoder_input = (attention_value[:, :, None] * encoded).sum(dim=1, keepdim=True)
    output, hidden = decoder(decoder_input, hidden)
    outputs.append(output)
outputs = torch.cat(outputs, dim=1)
Separius
  • 1,226
  • 9
  • 24