2

I'm using a test list and a prediction list which contains 4000 elements like in this example

      test_list=[1,0,0,1,0,.....]
prediction_list=[1,1,0,1,0......]

How can I find the binary cross entropy between these 2 lists in terms of python code? I tried using the log_loss function from sklearn:

log_loss(test_list,prediction_list)

but the output of the loss function was like 10.5 which seemed off to me. Am I using the function the wrong way or should I use another implementation ?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Alastor
  • 101
  • 1
  • 7
  • 1
    Show what you did with the `log_loss` function? – Mihai Chelaru Jan 15 '19 at 22:47
  • @MihaiChelaru updated the OP – Alastor Jan 15 '19 at 22:50
  • 1
    The reason I commented that is because you should strive to create a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) of what you did, what the output was, and how it differed from the expected output. If your question is too vague it may put people off answering it, or make it too difficult to give you a concrete answer as to what might be going wrong with your implementation. Do your best to reduce the work people have to do when answering and you'll be rewarded in kind. – Mihai Chelaru Jan 15 '19 at 22:58
  • @MihaiChelaru you are correct sir,it's just that I'm a novice in terms of using loss functions and I wanted to kinda get a fresh answer based on my 2 lists regardless of what I did afterwards,since I'm open to whatever implementation you guys would propose ! – Alastor Jan 15 '19 at 23:06

3 Answers3

1

Hey for log_loss function you are supposed to input the probabilities of predicting 1 or 0 not the predicted label. Cross entropy loss is not defined for probabilities 0 and 1. so your prediction list should either - prediction_list = [0.8,0.4,0.3...] The probabilities are assumed to be for positive label. or it should prediction_list = [[0.8,0.2],[0.4,0.6], [0.3,0.7] ...] The result you are seeing in because of eps in the scikit implementation.

I am assuming that you prediction_list is label list, because it's rare to see a model to predict probabilities of 0 and 1.

sagarwal
  • 95
  • 1
  • 4
  • Thanks for the answer friend. Yeah,it's a label list you are correct. How can I turn this into probabilities then? – Alastor Jan 15 '19 at 23:43
  • 1
    Yes your predictor model should give you prediction probabilities. Also if the answer is right please accept is as correct answer. – sagarwal Jan 16 '19 at 00:31
1

I assume you already have the data and labes and you have split it into train and test data and labels and you get the prediction list using the following method. Then you need to get the probabilities from the model by calling clf.predict_proba(data_test) as seen below.

import numpy as np
from sklearn.metrics import log_loss
from sklearn.linear_model import LogisticRegression

#test_list = [1,0,0,1,0,.....]
#prediction_list = [1,1,0,1,0......]

# Model learning and prediction
clf = LogisticRegression()
prediction_list = clf.predict(data_test)
pred_probabilities = clf.predict_proba(data_test)

# Evaluation of the prediction
print("The binary cross entropy loss is : %f" % log_loss(labels_test, pred_probabilities))

I'm still new in Machine Learning so take this with a grain of salt.

thelaw
  • 570
  • 4
  • 14
0

You're using it right. The values of binary crossentropy are unbounded - range from 0 to infinity. See https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

Filipe Aleixo
  • 3,924
  • 3
  • 41
  • 74
  • Thanks for the answer friend. So based on my result,is my prediction model utter trash? Since based on the graph on the link above,my prediction rate stands at around 0.1 so it can match 10.5 log loss? Or am I reading this wrong? – Alastor Jan 15 '19 at 23:10