I've been having problems getting my data to fit the dimensions required by pytorch GRU.
- My input is a 256-long float vector, in bathes of 64, so the size of a batch tensor is [64, 256]
- According to pytorch documentation, GRU takes input of size [batch_size, sequence_length, input_size]. Now i'm not sure if the sequence_length corresponds to the length of the output sequence, nor am I sure what the input_size would be here (256?).
- My GRU is supposed to take the whole vector as an input, generate output, and pass the output to the next gru cell as input. This is ought to continue until a sequence of 128 outputs is generated. My idea for the GRU network (see the picture)
- Each of the outputs will be passed through 256 -> 42 fc layer and a token from the alphabet of 42 will be chosen.
What this network is going to do is take a 256-long encoded vector representation of a molecule and learn to generate the corresponding SELFIES string (text-based molecule representation), padded to the length of 128, with tokens from an alphabet of 42 'letters'.
Now, i have no idea how to reshape the input tensor for the GRU to accept it as an input, according to the drawing I attached.
Thanks in advance for your help.
I tried to torch.unsqueeze(1) the input tensor. This resulted in me getting an output of shape [64, 1, 256] which would be a batch of 64 one-token outputs in my model.
class DecoderNet(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size, output_len):
super(DecoderNet, self).__init__()
# GRU parameters
self.input_size = input_size # = 256
self.hidden_size = hidden_size # = 256
self.num_layers = num_layers # = 1
# output token count
self.output_size = output_size # = 42
# output length or GRU time steps count
self.output_len = output_len # = 128
# pytorch.nn
self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers)
self.fc = nn.Linear(hidden_size, output_size)
self.softmax = nn.Softmax(dim=2)
self.relu = nn.ReLU()
def forward(self, x, h):
out, h = self.gru(x, h)
return out, h
def init_hidden(self, batch_size):
h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size)
return h0