I want to build a model, that predicts next character based on the previous characters. I have spliced text into sequences of integers with length = 100(using dataset and dataloader).
Dimensions of my input and target variables are:
inputs dimension: (batch_size,sequence length). In my case (128,100)
targets dimension: (batch_size,sequence length). In my case (128,100)
After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)
but when I calculate my loss using nn.CrossEntropyLoss()
function:
batch_size = 128
sequence_length = 100
number_of_classes = 44
# creates random tensor of your output shape
output = torch.rand(batch_size,sequence_length, number_of_classes)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)
I get an error:
ValueError: Expected target size (128, 44), got torch.Size([128, 100])
Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? Especially sequence dimension? According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2...dN), where N is batch_size,C - number of classes. But what is D? Is it related to sequence length?