3

I am confused about the calculation of cross entropy in Pytorch. If I want to calculate the cross entropy between 2 tensors and the target tensor is not a one-hot label, which loss should I use? It is quite common to calculate the cross entropy between 2 probability distributions instead of the predicted result and a determined one-hot label.

The basic loss function CrossEntropyLoss forces the target as the index integer and it is not eligible in this case. BCELoss seems to work but it gives an unexpected result. The expected formula to calculate the cross entropy is

enter image description here

But BCELoss calculates the BCE of each dimension, which is expressed as

-yi*log(pi)-(1-yi)*log(1-pi)

Compared with the first equation, the term -(1-yi)*log(1-pi) should not be involved. Here is an example using BCELoss and we can see the second term is involved in each dimension's result. And that make the result different from the correct one.

import torch.nn as nn
import torch
from math import log

a = torch.Tensor([0.1,0.2,0.7])
y = torch.Tensor([0.2,0.2,0.6])
L = nn.BCELoss(reduction='none')
y1 = -0.2 * log(0.1) - 0.8 * log(0.9)
print(L(a, y))
print(y1)

And the result is

tensor([0.5448, 0.5004, 0.6956])
0.5448054311250702

If we sum the results of all the dimensions, the final cross entropy doesn't correspond to the expected one. Because each one of these dimensions involves the -(1-yi)*log(1-pi) term. In constrast, Tensorflow can calculate the correct cross entropy value with CategoricalCrossentropy. Here is the example with the same setting and we can see the cross entropy is calculated in the same way as the first formula.

import tensorflow as tf
from math import log
L = tf.losses.CategoricalCrossentropy()
a = tf.convert_to_tensor([0.1,0.2,0.7])
y = tf.convert_to_tensor([0.2,0.2,0.6])
y_ = -0.2* log(0.1) - 0.2 * log(0.2) - 0.6 * log(0.7)

print(L(y,a), y_)
tf.Tensor(0.9964096, shape=(), dtype=float32) 0.9964095674488687

Is there any function can calculate the correct cross entropy in Pytorch, using the first formula, just like CategoricalCrossentropy in Tensorflow?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Zhongzheng_11
  • 181
  • 1
  • 10

3 Answers3

3

The fundamental problem is that you are incorrectly using the BCELoss function.

Cross-entropy loss is what you want. It is used to compute the loss between two arbitrary probability distributions. Indeed, its definition is exactly the equation that you provided:

enter image description here

where p is the target distribution and q is your predicted distribution. See this StackOverflow post for more information.

In your example where you provide the line

y = tf.convert_to_tensor([0.2, 0.2, 0.6])

you are implicitly modeling a multi-class classification problem where the target class can be one of three classes (the length of that tensor). More specifically, that line is saying that for this one data instance, class 0 has probably 0.2, class 1 has probability 0.2, and class 2 has probability 0.6.

The problem you are having is that PyTorch's BCELoss computes the binary cross-entropy loss, which is formulated differently. Binary cross-entropy loss computes the cross-entropy for classification problems where the target class can be only 0 or 1.

In binary cross-entropy, you only need one probability, e.g. 0.2, meaning that the probability of the instance being class 1 is 0.2. Correspondingly, class 0 has probability 0.8.

If you give the same tensor [0.2, 0.2, 0.6] to BCELoss, you are modeling a situation where there are three data instances, where data instance 0 has probability 0.2 of being class 1, data instance 1 has probability 0.2 of being class 1, and data instance 2 has probability 0.6 of being class 1.

Now, to your original question:

If I want to calculate the cross entropy between 2 tensors and the target tensor is not a one-hot label, which loss should I use?

Unfortunately, PyTorch does not have a cross-entropy function that takes in two probability distributions. See this question: https://discuss.pytorch.org/t/how-should-i-implement-cross-entropy-loss-with-continuous-target-outputs/10720

The recommendation is to implement your own function using its equation definition. Here is code that works:

def cross_entropy(input, target):
    return torch.mean(-torch.sum(target * torch.log(input), 1))


y = torch.Tensor([[0.2, 0.2, 0.6]])
yhat = torch.Tensor([[0.1, 0.2, 0.7]])
cross_entropy(yhat, y)
# tensor(0.9964)

It provides the answer that you wanted.

stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
  • What an excellent answer! Thank you so much. Btw, it really surprises me that Pytorch doesn't provide an official API for this case. – Zhongzheng_11 Aug 03 '21 at 03:43
  • @stackoverflowuser2010 It is not true that "PyTorch does not have a cross-entropy function that takes in two probability distributions". Take a look at https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html (See also the answer from user10517719.) – ATony Dec 30 '21 at 18:28
2

Update: from version 1.10, Pytorch supports class probability targets in CrossEntropyLoss, so you can now simply use:

criterion = torch.nn.CrossEntropyLoss()
loss = criterion(x, y)

where x is the input, y is the target. When y has the same shape as x, it's gonna be treated as class probabilities. Note that x is expected to contain raw, unnormalized scores for each class, while y is expected to contain probabilities for each class (typically the output of the softmax layer). You can find details in the docs.

0

Maybe you should try the torch.nn.CrossEntropyLoss function

alihank
  • 98
  • 2
  • 10