ValueError: Expected target size (128, 44), got torch.Size([128, 100]), LSTM Pytorch

Question

I want to build a model, that predicts next character based on the previous characters. I have spliced text into sequences of integers with length = 100(using dataset and dataloader).

Dimensions of my input and target variables are:

inputs dimension: (batch_size,sequence length). In my case (128,100)
targets dimension: (batch_size,sequence length). In my case (128,100)

After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)

but when I calculate my loss using nn.CrossEntropyLoss() function:

batch_size = 128
sequence_length   = 100
number_of_classes = 44
# creates random tensor of your output shape
output = torch.rand(batch_size,sequence_length, number_of_classes)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()

# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)

I get an error:

ValueError: Expected target size (128, 44), got torch.Size([128, 100])

Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? Especially sequence dimension? According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2...dN), where N is batch_size,C - number of classes. But what is D? Is it related to sequence length?

Your input is `(128, 100, 44)` and your trying to predict the next character (so basically the '101th' character). Why do you have a prediction which has a `seq_length` of *100*? Personally I would have it output `(128, 44)` (as I'm guessing dim=1 is a OHE of length *44*), and calculate the CrossEntropyLoss with that character prediction. Edit: Maybe I misread, are you looking to calculate the CEL at each timestep (i.e. after each character prediction of the input sequence)? — Ivan, Dec 27 '20 at 21:11
@Ivan : no, my Input has dimension (128,100) - (batch_size, sequence_length). Basically, I have tokenized text(string) into integers and then I have defined Input as tokenized_text[:-1] and Target as tokenized_text[1:]. So Target is shifted to the right by one token. What I want to do: compare each predicted integer in the output(100 integers in one batch) to the integers in the target(100 integers in 1 batch) using Cross_entropy. — Daniel Yefimov, Dec 27 '20 at 21:18
@Ivan I didn't do One Hot Encoding, it's not required I think. So I want to predict character at each timestep — Daniel Yefimov, Dec 27 '20 at 21:19

score 3 · Accepted Answer · answered Dec 27 '20 at 21:27

As a general comment, let me just say that you have asked many different questions, which makes it difficult for someone to answer. I suggest asking just one question per StackOverflow post, even if that means making several posts. I will answer just the main question that I think you are asking: "why is my code crashing and how to fix it?" and hopefully that will clear up your other questions.

Per your code, the output of your model has dimensions (128, 100, 44) = (N, D, C). Here N is the minibatch size, C is the number of classes, and D is the dimensionality of your input. The cross entropy loss you are using expects the output to have dimension (N, C, D) and the target to have dimension (N, D). To clear up the documentation that says (N, C, D1, D2, ..., Dk), remember that your input can be an arbitrary tensor of any dimensionality. In your case inputs have length 100, but nothing is to stop someone from making a model with, say, a 100x100 image as input. (In that case the loss would expect output to have dimension (N, C, 100, 100).) But in your case, your input is one dimensional, so you have just a single D=100 for the length of your input.

Now we see the error, outputs should be (N, C, D), but yours is (N, D, C). Your targets have the correct dimensions of (N, D). You have two paths the fix the issue. First is to change the structure of your network so that its output is (N, C, D), this may or may not be easy or what you want in the context of your model. The second option is to transpose your axes at the time of loss computation using torch.transpose https://pytorch.org/docs/stable/generated/torch.transpose.html

batch_size = 128
sequence_length   = 100
number_of_classes = 44
# creates random tensor of your output shape (N, D, C)
output = torch.rand(batch_size,sequence_length, number_of_classes)
# transposes dimensionality to (N, C, D)
tansposed_output = torch.transpose(output, 1, 2)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()

# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(transposed_output, target)
print(loss)

ValueError: Expected target size (128, 44), got torch.Size([128, 100]), LSTM Pytorch

1 Answers1

Linked