0

i'm plotting ROC curves and precision-recall curves to evaluate various classification models for a problem i'm working on. i've noticed that scikit-learn has some nice convenience functions for computing such curves:

Is there a method (maybe hidden) that calculates all of these in one call? Or maybe that returns counts of TP, TN, FP, and FN (from which one could compute arbitrary metrics) and the associated thresholds?

for example,

fp, tp, fn, tn, thresholds = sklearn.metrics.errors_curve(y_true, y_score)

I could in theory compute precision and recall from the ROC curve (TPR and FPR), because I know the true counts of positives and negatives in my data. But I'd like to use a library to do this so I don't have to worry about the math.

william_grisaitis
  • 5,170
  • 3
  • 33
  • 40
  • This post has a few useful answers : [Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative](https://stackoverflow.com/questions/31324218/scikit-learn-how-to-obtain-true-positive-true-negative-false-positive-and-fal) – Mattravel Mar 24 '23 at 02:08
  • thanks :) i'm looking for something like that but for each threshold value (which `sklearn.metrics.*_curve` methods already do) – william_grisaitis Mar 24 '23 at 02:39

1 Answers1

1

As I am aware, there is no function in scikit-learn that you can compute those metrics for different thresholds. However, you could calculate it yourself through the roc curve

import numpy as np
from sklearn.metrics import roc_curve

###
# Compute the TPR and FPR for different thresholds
###
fpr, tpr, thresholds = roc_curve(y_true, y_score)

###
# Calculate the number of positive and negative samples
###
P = np.sum(y_true == 1)
N = np.sum(y_true == 0)

###
# Compute TP, TN, FP, and FN for each threshold
###
TP = tpr * P
TN = (1 - fpr) * N
FP = fpr * N
FN = P - TP

The above arrays will have the metrics you want for the different thresholds you get from the roc_curve

Tasos
  • 7,325
  • 18
  • 83
  • 176