Why is my implementations of the log-loss (or cross-entropy) not producing the same results?

Question

I was reading up on log-loss and cross-entropy, and it seems like there are 2 approaches for calculating it, based on the following equations.

The first one is the following.

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce


predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

x = cross_entropy(predictions, targets)
print(log_loss(targets, predictions), 'our_answer:', ans)

The output of the previous program is 0.7083767843022996 our_answer: 0.71355817782, which is almost the same. So that's not the issue.

The above implementation is the middle part of the equation above.

The second approach is based on the RHS part of the equation above.

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred) - (1-class_act) * np.log(1-class_pred)

print(res/len(targets))

And the output is 1.1549753967602232, which is not quite the same.

I have tried the same implementation with NumPy, but it also didn't work. What am I doing wrong?

PS: I am also curious that -y log (y_hat) seems to me that it's same as - sigma(p_i * log( q_i)) then how come there is a -(1-y) log(1-y_hat) part. Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

Your implementation might not be numerically stable for your choice of inputs. — Mateen Ulhaq, Mar 25 '18 at 07:54
@mateen: I tried with a larger dataset as well. Just for minimum viable code example I gave this input. I am very sure I am misunderstanding how `y log(y_hat)` is calculated. — Vikash Singh, Mar 25 '18 at 08:12

score 12 · Accepted Answer · edited Jan 05 '20 at 14:39

I cannot reproduce the difference in the results you report in the first part (you also refer to an ans variable, which you do not seem to define, I guess it is x):

import numpy as np
from sklearn.metrics import log_loss


def cross_entropy(predictions, targets):
    N = predictions.shape[0]
    ce = -np.sum(targets * np.log(predictions)) / N
    return ce

predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
                   [0,0,0,1]])

The results:

cross_entropy(predictions, targets)
# 0.7083767843022996

log_loss(targets, predictions)
# 0.7083767843022996

log_loss(targets, predictions) == cross_entropy(predictions, targets)
# True

Your cross_entropy function seems to work fine.

Regarding the second part:

Clearly I am misunderstanding how -y log (y_hat) is to be calculated.

Indeed, reading more carefully the fast.ai wiki you have linked to, you'll see that the RHS of the equation holds only for binary classification (where always one of y and 1-y will be zero), which is not the case here - you have a 4-class multinomial classification. So, the correct formulation is

res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
    for class_act, class_pred in zip(act_row, pred_row):
        res += - class_act * np.log(class_pred)

i.e. discarding the subtraction of (1-class_act) * np.log(1-class_pred).

Result:

res/len(targets)
# 0.7083767843022996

res/len(targets) == log_loss(targets, predictions)
# True

On a more general level (the mechanics of log loss & accuracy for binary classification), you may find this answer useful.

There are 2 parts 2 the question. First part is working. 2nd part is not. That's where the `res` is computed. Please have a look and see if you can help. — Vikash Singh, Mar 25 '18 at 10:00
@VikashSingh sorry, but reporting different results in the 1st part is not exactly the definition of "it's working"; see update for the 2nd part (was already writing it when you commented) — desertnaut, Mar 25 '18 at 10:06
Sorry I caused you confusion. Have updated the question. So you are basically saying is that the the formula for log_loss is same as cross entropy for binary class, but log_loss doesn't work the same way for multi class classification? — Vikash Singh, Mar 25 '18 at 12:46
@VikashSingh 1) *practically* speaking, in ML contexts "log loss" & "[categorical] cross-entropy [loss]" refer to the same thing, which is the sum of `-y*log(y_hat)` for all samples; scikit-learn, for example, uses the first term, while Keras uses the second. — desertnaut, Mar 25 '18 at 14:35
@VikashSingh 2) now, for the special case of binary `y` only, this formula becomes `-y*log(y_hat) - (1-y)*log(1-y_hat)` (since, for each individual sample, only one term survives, the other being 0), but this is not applicable to your case, which is not binary. — desertnaut, Mar 25 '18 at 14:36

Why is my implementations of the log-loss (or cross-entropy) not producing the same results?

1 Answers1