0

I have the following confusion matrix for a SVC model compute with sklearn:

Classification report:
              precision    recall  f1-score   support

           0  0.7975000 0.5907407 0.6787234       540
           1  0.6316667 0.8239130 0.7150943       460

    accuracy                      0.6980000      1000
   macro avg  0.7145833 0.7073269 0.6969089      1000
weighted avg  0.7212167 0.6980000 0.6954540      1000

How can I compute the overall precision/recall and f1-score starting only from the value in the matrix? So without importing the function from sklearn and just computing "by hand" form the confusion matrix itself?

PwNzDust
  • 271
  • 2
  • 9
  • 1
    This article explains it quite well : [link](https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262) – Mattravel Jul 27 '23 at 09:06
  • Does this answer your question? [How to write a confusion matrix](https://stackoverflow.com/questions/2148543/how-to-write-a-confusion-matrix) – PV8 Jul 27 '23 at 11:06

1 Answers1

1

If you already have the confusion matrix, the example below shows how to add things up for the required metrics. It assumes that the columns/rows of the confusion matrix represent the label/predictions.

In the first loop, for each label (column) it finds the number of samples, true positives, and false negatives. In the second loop, for each row (prediction) it sums up the false positives. You can then combine metrics to get the sensitivity/recall, precision, F1-score, etc.

A final check is made against sklearn.metrics.classification_report(). If you run your own data through this I recommend confirming results against the sklearn report to ensure things work as expected.

import pandas as pd
import numpy as np

#Synthetic 3-class data and predictions
y_true = ([0]*(11+4+7)
          + [1]*(3+16+2)
          + [2]*(1+5+23))

y_pred = ([0]*11 + [1]*4 + [2]*7
          + [0]*3 + [1]*16 + [2]*2
          + [0]*1 + [1]*5 + [2]*23)

#question assumes we have the confusion matrix
confusion_matrix = np.array([[11, 3, 1],
                             [4, 16, 5],
                             [7, 2, 23]]) 

#Dataframe for easier viewing of labels and predictions
confusion_df = pd.DataFrame(
    confusion_matrix,
    columns=['is_orange', 'is_apple', 'is_pear'],
    index=['predicted_orange', 'predicted_apple', 'predicted_pear']
)

metrics = {} #for recording metrics, for each class
n_classes = confusion_matrix.shape[0]
for label_idx in range(n_classes):
    metrics[label_idx] = {
        'tp': confusion_matrix[label_idx, label_idx],
        'fn': sum( [confusion_matrix[pred_idx, label_idx]
                    for pred_idx in range(n_classes)
                    if pred_idx != label_idx] ),
        'n_samples': confusion_matrix[:, label_idx].sum()
    }

for pred_idx in range(n_classes):
    metrics[pred_idx].update({
        'fp': sum( [confusion_matrix[pred_idx, label_idx]
                    for label_idx in range(n_classes)
                    if label_idx != pred_idx] )
    })

for cls, cnts in metrics.items():
    metrics[cls].update({
        'precision': cnts['tp'] / (cnts['tp'] + cnts['fp']),
        'recall': cnts['tp'] / (cnts['tp'] + cnts['fn']),
        'f1-score': cnts['tp'] / ( cnts['tp'] + 0.5*(cnts['fp'] + cnts['fn']))
    })

print(metrics)  #Take a look at the computed metrics per class

#Confirm macro scores against
# sklearn.metrics.classification_report()
from sklearn.metrics import classification_report
cr = classification_report(y_true, y_pred)
print(cr)
some3128
  • 1,430
  • 1
  • 2
  • 8