How to calculate TPR and FPR in Python without using sklearn?

Question

Initialize the list of lists:

data = [[1.0, 0.635165,0.0], [1.0, 0.766586,1.0], [1.0, 0.724564,1.0],
        [1.0, 0.766586,1.0],[1.0, 0.889199,1.0],[1.0, 0.966586,1.0],
        [1.0, 0.535165,0.0],[1.0, 0.55165,0.0],[1.0, 0.525165,0.0],
        [1.0, 0.5595165,0.0] ]

Create the Pandas DataFrame:

df = pd.DataFrame(data, columns = ['y', 'prob','y_predict'])

Print data frame.

print(df)

For this data-set, I want to find:

Confusion matrix without using Sklearn
Numpy array of TPR and FPR without using Sklearn, for plotting ROC.

How to do this in python?

For the calculation of the confusion matrix you can take a look at this question: https://stackoverflow.com/q/61193476/11989081 — Flavia Giammarino, Apr 20 '20 at 12:24
@gflavia...can you suggest for 2. Numpy array of TPR and FPR without using Sklearn, for plotting ROC. — Sahil Kamboj, Apr 20 '20 at 12:43
Take a look at this for calculating TPR and FPR : https://stackoverflow.com/a/29910634/13149719 — baby_yoda, Apr 20 '20 at 13:23

Flavia Giammarino · Accepted Answer · 2021-10-15T06:17:04.390

You can calculate the false positive rate and true positive rate associated to different threshold levels as follows:

import numpy as np

def roc_curve(y_true, y_prob, thresholds):

    fpr = []
    tpr = []

    for threshold in thresholds:

        y_pred = np.where(y_prob >= threshold, 1, 0)

        fp = np.sum((y_pred == 1) & (y_true == 0))
        tp = np.sum((y_pred == 1) & (y_true == 1))

        fn = np.sum((y_pred == 0) & (y_true == 1))
        tn = np.sum((y_pred == 0) & (y_true == 0))

        fpr.append(fp / (fp + tn))
        tpr.append(tp / (tp + fn))

    return [fpr, tpr]

score 0 · Answer 2 · answered Apr 20 '20 at 12:30

0

... without sklearn python module:

Confusion matrix without using Sklearn
- You can use the pandas_ml
  
  from pandas_ml import ConfusionMatrix
- You can build your math formula for the Confusion matrix
About ROC you
- see the python MatLab example solve on this issue;
- can build your array and use the np and build your source code using the math formula.

You can understand more if you take a look at these articles:

logistic-regression-using-numpy - python examples regression;

what-is-the-roc-curve - theory;

roc-curve-part-2-numerical-example - python practice;

answered Apr 20 '20 at 12:30

Cătălin George Feștilă

1,364
27
48

1. I just need the function that can give me the NumPy array of TPR & FPR separately. I know how to plot ROC. I can use numpy.trapz(tpr_array, fpr_array) for the auc_score, if I had the required arrays. – Sahil Kamboj Apr 20 '20 at 12:41
Sorry, I don't know a specific function for these issues. The input data for arrays TPR an FRP give the graph for ROC. " I just need the function that can give me the NumPy array of TPR & FPR separately." - so you don't have input data and you don't know the theory. – Cătălin George Feștilă Apr 20 '20 at 14:25
it's ok, I got it. Thanks – Sahil Kamboj Apr 20 '20 at 15:00
no problem, give your vote and rate the answers for each response, this will help users to understand your problem into an area of answers. – Cătălin George Feștilă Apr 20 '20 at 15:40

score 0 · Answer 3 · answered Dec 28 '20 at 23:09

import numpy as np

def calculate_cm(predicted, actual):
  fp = np.sum((y_pred == 1) & (y_true == 0))
  tp = np.sum((y_pred == 1) & (y_true == 1))

  fn = np.sum((y_pred == 0) & (y_true == 1))
  tn = np.sum((y_pred == 0) & (y_true == 0))
  return tp, fp, fn, tn

def calculate_recall(tp, fp, fn, tn):
  return (tp)/(tp + fn)

def calculate_fallout(tp, fp, fn, tn):
  return (fp)/(fp + tn)

def calculate_at_threshold(threshold, actual, predicted):
  p = np.where(predicted >= threshold, 1, 0)
  tp, fp, fn, tn = calculate_cm(p, actual)
  tpr = calculate_recall(tp, fp, fn, tn)
  fpr = calculate_fallout(tp, fp, fn, tn)
  return fpr, tpr 

def roc_curve(actual, predicted, thresholds):
  tpr = []
  fpr = []
  for threshold in thresholds:
    fpr_t, tpr_t = calculate_at_threshold(threshold, actual, predicted)
    tpr.append(fpr_t)
    fpr.append(tpr_t)
  return fpr, tpr

score 0 · Answer 4 · answered Oct 12 '21 at 08:57

This is a slightly faster version of Flavia Giammarino's answer which only uses NumPy arrays; I also added a few comments and provided alternative, more generic variable names:

import numpy as np

def roc_curve(probabilities, ground_truth, thresholds):

    # Initialize FPR & TPR arrays
    fpr = np.empty_like(thresholds)
    tpr = np.empty_like(thresholds)

    # Compute FPR & TPR
    for t in range(0, len(thresholds)):
        y_pred = np.where(ground_truth >= thresholds[t], 1, 0)
        fp = np.sum((y_pred == 1) & (probabilities == 0))
        tp = np.sum((y_pred == 1) & (probabilities == 1))
        fn = np.sum((y_pred == 0) & (probabilities == 1))
        tn = np.sum((y_pred == 0) & (probabilities == 0))
        fpr[t] = fp / (fp + tn)
        tpr[t] = tp / (tp + fn)

    return fpr, tpr

Thresholds can be easily generated with a function like NumPy's linspace:

np.linspace(start, end, n)

where [start, end] is the thresholds' range (extremes included; should be start = 0 and end = 1) and n is the number of thresholds; from experience I can say that n = 50 is a good trade-off between speed and accuracy, although n >= 100 yields smoother curves.

How to calculate TPR and FPR in Python without using sklearn?

4 Answers4