3

How to print the confusion matrix for a logistic regression if change the value of threshold between [0.5,0.6,0.9] once 0.5 and once 0.6 and so one

from sklearn.linear_model import LogisticRegression
import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  

X = [[0.7,0.2],[0.9,0.4]]
y = [1,-1]

model = LogisticRegression()
model = model.fit(X,y)

threshold = [0.5,0.6,0.9]

CM = confusion_matrix(y_true, y_pred)

TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]
JW.
  • 2,081
  • 1
  • 20
  • 24
Sarah
  • 41
  • 1
  • 2
  • Hi! I don't understand your question: what is the threshold for? You don't use the variable `threshold` in your code. – JW. Oct 07 '19 at 11:03
  • Please check this [https://stackoverflow.com/questions/32627926/scikit-changing-the-threshold-to-create-multiple-confusion-matrixes](https://stackoverflow.com/questions/32627926/scikit-changing-the-threshold-to-create-multiple-confusion-matrixes) – Bruno Justino Praciano Oct 07 '19 at 11:10

3 Answers3

3

Let try this!

for i in threshold:
   y_predicted = model.predict_proba(X)[:1] > i
   print(confusion_matrix(y, y_predicted))

predict_proba() returns a numpy array of two columns. The first column is the probability that target=0 and the second column is the probability that target=1. That is why we add [:,1] after predict_proba() in order to get the probabilities of target=1

Eko Putra
  • 136
  • 1
  • 4
2

I think an easy approach in pseudo code (based a bit on python) would be:

1 - Predict a set of known value (X) y_prob = model.predict_proba(X) so you will get the probability per each input in X.

2 - Then for each threshold calculate the output. i.e. If y_prob > threshold = 1 else 0

3 - Now get the confussion matrix of each vector obtained.

If you need a deeper explanation on any point let me know!

Alfonso
  • 103
  • 1
  • 7
  • How can we find the best threshold, and then use it in that if condition: "If y_prob > threshold = 1 else 0" – Spedo Mar 05 '20 at 15:22
  • Well.. trial and error, also depending on what are you looking for. There is this trade off between detect all the positives or detect all negatives (if you say the result is always class 1 you will succeed finding all class one but you will fail detecting class -1) Look for auc as a metrics for your classifier. – Alfonso Apr 01 '20 at 11:20
0
def predict_y_from_treshold(model,X,treshold):  
    return np.array(list(map(lambda x : 1 if x > treshold else 0,model.predict_proba(X)[:,1])))
Bertrand
  • 1
  • 1