9

Suppose I have a confusion matrix as like as below. How can I calculate precision and recall?

enter image description here

enter image description here

vahideh
  • 127
  • 1
  • 2
  • 7

6 Answers6

11

first, your matrix is arranged upside down. You want to arrange your labels so that true positives are set on the diagonal [(0,0),(1,1),(2,2)] this is the arrangement that you're going to find with confusion matrices generated from sklearn and other packages.

Once we have things sorted in the right direction, we can take a page from this answer and say that:

  1. True Positives are on the diagonal position
  2. False positives are column-wise sums. Without the diagonal
  3. False negatives are row-wise sums. Without the diagonal.

\ Then we take some formulas from sklearn docs for precision and recall. And put it all into code:

import numpy as np
cm = np.array([[2,1,0], [3,4,5], [6,7,8]])
true_pos = np.diag(cm)
false_pos = np.sum(cm, axis=0) - true_pos
false_neg = np.sum(cm, axis=1) - true_pos

precision = np.sum(true_pos / (true_pos + false_pos))
recall = np.sum(true_pos / (true_pos + false_neg))

Since we remove the true positives to define false_positives/negatives only to add them back... we can simplify further by skipping a couple of steps:

 true_pos = np.diag(cm) 
 precision = np.sum(true_pos / np.sum(cm, axis=0))
 recall = np.sum(true_pos / np.sum(cm, axis=1))
Jeroen Boeye
  • 580
  • 4
  • 18
PabTorre
  • 2,878
  • 21
  • 30
  • 9
    for future reference: the summation at the end is incorrect (last two lines), it should be mean (average) to calculate the average precision and average recall. without the summation, you would get an individual precision and recall for each class. – EuWern Jan 17 '20 at 16:10
4

I don't think you need summation at last. Without summation, your method is correct; it gives precision and recall for each class.

If you intend to calculate average precision and recall, then you have two options: micro and macro-average.

Read more here http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

gruangly
  • 942
  • 8
  • 13
2

For the sake of completeness for future reference, given a list of grounth (gt) and prediction (pd). The following code snippet computes confusion matrix and then calculates precision and recall.

from sklearn.metrics import confusion_matrix

gt = [1,1,2,2,1,0]
pd = [1,1,1,1,2,0]

cm = confusion_matrix(gt, pd)

#rows = gt, col = pred

#compute tp, tp_and_fn and tp_and_fp w.r.t all classes
tp_and_fn = cm.sum(1)
tp_and_fp = cm.sum(0)
tp = cm.diagonal()

precision = tp / tp_and_fp
recall = tp / tp_and_fn
EuWern
  • 123
  • 7
2

Given:

hypothetical confusion matrix (cm)

cm = 
[[ 970    1    2    1    1    6   10    0    5    0]
 [   0 1105    7    3    1    6    0    3   16    0]
 [   9   14  924   19   18    3   13   12   24    4]
 [   3   10   35  875    2   34    2   14   19   19]
 [   0    3    6    0  903    0    9    5    4   32]
 [   9    6    4   28   10  751   17    5   24    9]
 [   7    2    6    0    9   13  944    1    7    0]
 [   3   11   17    3   16    3    0  975    2   34]
 [   5   38   10   16    7   28    5    4  830   20]
 [   5    3    5   13   39   10    2   34    5  853]]

Goal:

precision and recall for each class using map() to calculate list division.

from operator import truediv
import numpy as np

tp = np.diag(cm)
prec = list(map(truediv, tp, np.sum(cm, axis=0)))
rec = list(map(truediv, tp, np.sum(cm, axis=1)))
print ('Precision: {}\nRecall: {}'.format(prec, rec))

Result:

Precision: [0.959, 0.926, 0.909, 0.913, 0.896, 0.880, 0.941, 0.925, 0.886, 0.877]
Recall:    [0.972, 0.968, 0.888, 0.863, 0.937, 0.870, 0.954, 0.916, 0.861, 0.880]

please note: 10 classes, 10 precisions and 10 recalls.

Farid Alijani
  • 839
  • 1
  • 7
  • 25
0

Agreeing with gruangly and EuWern, I modified PabTorre's solution accordingly to generate precision and recall per class.

Also, given my use case (NER) where a model could:

  1. Never predict a class that is present in the input text (i.e. a column of zeros, i.e. TP:0, FP:0, FN: all), causing a nan in the precision array, or
  2. Predict a class that is completely absent in the input text (i.e. a row of zeros, i.e. TP:0, FN:0, FP: all), causing a nan in the recall array...

I wrap the array with a numpy.nan_to_num() to convert any nan to zero. This is not a mathematical decision, but a per use-case, functional decision in how to handle never-predicted, or never-occuring classes.

import numpy
confusion_matrix = numpy.array([
        [ 5,  0,  0,  0,  0,  3], 
        [ 0,  2,  0,  1,  0,  5],
        [ 0,  0,  0,  3,  5,  7],
        [ 0,  0,  0,  9,  0,  0],
        [ 0,  0,  0,  9, 32,  3],
        [ 0,  0,  0,  0,  0,  0]
        ])
true_positives = numpy.diag(confusion_matrix)
false_positives = numpy.sum(confusion_matrix, axis=0) - true_positives
false_negatives = numpy.sum(confusion_matrix, axis=1) - true_positives

precision = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_positives)))
recall = numpy.nan_to_num(numpy.divide(true_positives, (true_positives + false_negatives)))

print(true_positives)       # [ 5  2  0  9 32  0 ]
print(false_positives)      # [ 0  0  0 13  5 18 ]
print(false_negatives)      # [ 3  6 15  0 12  0 ]
print(precision)            # [1. 1. 0. 0.40909091 0.86486486 0. ]
print(recall)               # [0.625 0.25 0. 1. 0.72727273 0. ]
Leobeeson
  • 500
  • 6
  • 13
0
import numpy as np

n_classes=3
cm = np.array([[0,1,2],
               [5,4,3],
               [8,7,6]])

sp = []
f1 = []
gm = []
sens = []
acc= []

for c in range(n_classes):
    tp = cm[c,c]
    fp = sum(cm[:,c]) - cm[c,c]
    fn = sum(cm[c,:]) - cm[c,c]
    tn = sum(np.delete(sum(cm)-cm[c,:],c))

    recall = tp/(tp+fn)
    precision = tp/(tp+fp)
    accuracy = (tp+tn)/(tp+fp+fn+tn)
    specificity = tn/(tn+fp)
    f1_score = 2*((precision*recall)/(precision+recall))
    g_mean = np.sqrt(recall * specificity)
    sp.append(specificity)
    f1.append(f1_score)
    gm.append(g_mean)
    sens.append(recall)
    acc.append(tp)

    print("for class {}: recall {}, specificity {}\
          precision {}, f1 {}, gmean {}".format(c,round(recall,4), round(specificity,4), round(precision,4),round(f1_score,4),round(g_mean,4)))
print("sp: ", np.average(sp))
print("f1: ", np.average(f1))
print("gm: ", np.average(gm))
print("sens: ", np.average(sens))
print("accuracy: ", np.sum(acc)/np.sum(cm))

Euroka
  • 88
  • 1
  • 7