1

I'm confused on how to apply cross entropy loss for my time series model where the output is in the shape of [batch_size, classes, time_steps] and target of shape [batch_size, time_steps, classes]. I'm trying to made the model determine the confidence of the 16 classes at each timesteps. By using the following approach, I get a large loss and the model doesn't seems to be learning:

batch_size = 256
time_steps = 224
classes = 16

y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps, classes)).view(batch_size, classes, -1)
loss = torch.nn.functional.cross_entropy(y_est, y_true)

Do you think I've made a mistake here?

papillon
  • 21
  • 5

2 Answers2

1

Pytorch documentation for CrossEntropyLoss:

Input shape: (N, C, d1,...dk)

Output shape: (N, d1,...dk) Where N is the batch size, and C is the number of classes, with K >= 1 in the case of K-dimensional loss.

So based on the docs, the code should be

batch_size = 256
time_steps = 224
classes = 16

y_est = torch.randn((batch_size, classes, time_steps))
y_true = torch.randn((batch_size, time_steps))
loss = torch.nn.functional.cross_entropy(y_est, y_true)
Hatem
  • 416
  • 2
  • 6
  • But that's no longer a multi-label for each timeteps no? Which mean I no longer knows the confidence for each class for each timestep. – papillon Jul 07 '22 at 14:08
  • Your model will predict the probability of each class in each time step and your label should include the ground truth (one class) in each time step, does this clarify the confusion? – Hatem Jul 07 '22 at 14:13
1

As @Hatem described, your target tensor should have one dimension less than the predicted tensor because its representation is not a one-hot-encoding but rather a dense encoding (the values represent the class label itself). Whereas your prediction tensor will contain a probability distribution across all possible classes.

So here since your prediction tensor y_est is shaped (batch_size, classes, time_steps), then your target tensor should have a shape of (batch_size, time_steps). If your target is in one-hot-encoding format, you can easily switch back to the required format by applying torch.argmax:

loss = F.cross_entropy(y_est, y_true.argmax(1))
Ivan
  • 34,531
  • 8
  • 55
  • 100
  • I think I'm starting to get it. What if the target is not one-hot encoding, but rather probabilities? From Pytorch's doc, it mentioned that `If containing class probabilities, same shape as the input and each value should be between [0,1]`, I would assume `y_est = (batch_size, classes, time_steps)` and `y_true = (batch_size, classes, time_steps)` is correct? In the `y_true`, every 3rd dimension is the values of probability of each class. – papillon Jul 08 '22 at 02:33
  • The PyTorch implementation of `CrossEntropyLoss` does not allow the target to contain class probabilities, it only supports one-hot encodings, *i.e.* for single-label classification tasks only. If you want to compute the cross-entropy between two distributions you should be using a soft-cross-entropy loss function. This can be easily implemented by computing `-target*F.log_softmax(pred, 1)`. And of course, in this case, both tensors would have the exact same shape. You can read more about it [here](https://stackoverflow.com/questions/68907809/soft-cross-entropy-in-pytorch/68914806#68914806). – Ivan Jul 08 '22 at 06:12
  • 3
    I think that answer is no longer applied, I think Pytorch now support the probabilities, quoted from the manual `Probabilities for each class; useful when labels beyond a single class per minibatch item are required, such as for blended labels, label smoothing, etc.`. The pull request on their github https://github.com/pytorch/pytorch/pull/61044 (hopefully I'm not mistaken) – papillon Jul 08 '22 at 06:18
  • Oh, it seems I was not aware of this PR, and it made it to master. Thanks for pointing that out! Back to your problem though, which version of PyTorch are you using? – Ivan Jul 08 '22 at 06:23
  • Sure man. I just checked, it's 1.12. – papillon Jul 08 '22 at 06:24
  • Sorry, I'm afraid my answer is offtopic, it wasn't about how to apply a soft-cross-entropy but rather how to make the model learn... – Ivan Jul 08 '22 at 06:26
  • I don't necessary think so, it actually helped me understands the dimensionality for cross entropy loss, in contrast to my original implementation. I tried with one-hot approach and it seems the model does start to learns. Your sample code above actually helps a lot. – papillon Jul 08 '22 at 06:30