0

I'm following this tutorial https://youtu.be/0HDy6n3UD5M?t=1320 where he says he is calculating the false positives, but gets a numpy array of what I understand to be the 'false negatives' and 'false positives'.

E.g. confusion matrix is:

cm = confusion_matrix(y_train, y_pred, labels =[1,0])

[array([[250,  83],
       [ 76, 311]])] 

and he outputs the false positives as

FP = cm.sum(axis = 0) - np.diag(cm)
array([76, 83])

Shouldn't false positives just be 83? I read in another article that he might be calculating potential false positives but what does that mean? This seems to be a sum of FP and FN.

Rest of the code is:

FN = cm.sum(axis = 1) - np.diag(cm)
TP = np.diag(cm) 
TN = cm.sum() - (FP + FN + TP)
TPR = TP / (TP + FN)
  • 1
    Please [edit] your question to make it reasonably self-contained. Many visitors are unable or simply unwilling to view a video just to figure out what you are asking. – tripleee Dec 22 '22 at 15:45
  • 1
    You are right, it should be a single number and that video is incorrect – Marat Dec 22 '22 at 15:48
  • @Marat thanks for the responses! do you understand what he's doing with the appended code? – codinggirl123 Dec 22 '22 at 15:55
  • @tripleee would appreciate any input too. thank you! – codinggirl123 Dec 22 '22 at 15:56
  • @AlexanderL.Hayes I actually just found a stackoverflow post with the exact same code, but still don't understand it. If you have time, do you mind explaining it? https://stackoverflow.com/questions/31324218/scikit-learn-how-to-obtain-true-positive-true-negative-false-positive-and-fal – codinggirl123 Dec 22 '22 at 16:07
  • I deleted my previous comment speculating on what the video creator meant: it was a little unkind. I watched a few minutes, and it looks like the author is showing how to compute class-dependent metrics. I think the way they're doing it is more complicated than it needs to be, so I've added an alternative interpretation as an answer. – Alexander L. Hayes Dec 22 '22 at 18:05

1 Answers1

0

It looks like that's trying to compute metrics in a class-dependent way.

Normally we think of "false positives" as a single number corresponding to an entry in the confusion matrix:

from sklearn.metrics import confusion_matrix

y_true = [0, 0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 0, 1]

tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
print(f"Number of false positives: {fp}")
# Number of false positives: 1

But we can also frame the false positives in a class-dependent way. We can compute a confusion matrix for each class, giving a (C, 2, 2) matrix where C is the number of classes:

mcm = multilabel_confusion_matrix(y_true, y_pred)
# [[[1 2]
#   [1 2]]
#
#  [[2 1]
#   [2 1]]]

Meaning we have a vector of true positives and a vector of false positives corresponding to each class:

tps = mcm[:, 1, 1]
# [2 1]

fps = mcm[:, 0, 1]
# [2 1]

Allowing us to compute metrics like "precision for each class":

print(f"Class-dependent precision: {tps / (tps + fps)}")
# Class-dependent precision: [0.5 0.5]

This is also how you arrive at the numbers in classification_report(y_true, y_pred):

              precision    recall  f1-score   support

           0       0.50      0.67      0.57         3
           1       0.50      0.33      0.40         3

    accuracy                           0.50         6
   macro avg       0.50      0.50      0.49         6
weighted avg       0.50      0.50      0.49         6
Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34