2

I have below the confusion matrix with 10 Y categories. How do I calculate the accuracy for the categories A, D, and E, and find TP, TM, FP, FN for each?

    A    B   C   D   E   F   G   H   I   J
   [41,  0,  0,  2,  1,  0,  0,  0,  0,  4],
   [ 1,  0,  0,  0,  4,  0,  0,  0,  0,  2],
   [ 3,  0, 12,  0,  1,  0,  0,  0,  0,  0],
   [ 0,  0,  0, 51, 10,  0,  0,  0,  0,  0],
   [ 1,  0,  0,  3, 78,  0,  0,  0,  0,  5],
   [ 1,  0,  0,  0,  0,  0,  0,  0,  0,  3],
   [ 4,  0,  0,  0,  2,  0,  5,  0,  0,  4],
   [ 0,  0,  1,  1,  3,  0,  0,  2,  0,  1],
   [ 4,  0,  0,  0,  1,  0,  0,  0,  0,  0],
   [10,  0,  0,  5, 15,  0,  0,  0,  0, 24]

Thank you for the help!

Bangbangbang
  • 560
  • 2
  • 12

2 Answers2

3

Visualise your confusion matrix

X = [[41, 0, 0, 2, 1, 0, 0, 0, 0, 4],
 [1, 0, 0, 0, 4, 0, 0, 0, 0, 2],
 [3, 0, 12, 0, 1, 0, 0, 0, 0, 0],
 [0, 0, 0, 51, 10, 0, 0, 0, 0, 0],
 [1, 0, 0, 3, 78, 0, 0, 0, 0, 5],
 [1, 0, 0, 0, 0, 0, 0, 0, 0, 3],
 [4, 0, 0, 0, 2, 0, 5, 0, 0, 4],
 [0, 0, 1, 1, 3, 0, 0, 2, 0, 1],
 [4, 0, 0, 0, 1, 0, 0, 0, 0, 0],
 [10, 0, 0, 5, 15, 0, 0, 0, 0, 24]]

cm = pd.DataFrame(X, columns=list("ABCDEFGHIJ"), index=list("ABCDEFGHIJ")) 

print(cm)

Output:

    A  B   C   D   E  F  G  H  I   J
A  41  0   0   2   1  0  0  0  0   4
B   1  0   0   0   4  0  0  0  0   2
C   3  0  12   0   1  0  0  0  0   0
D   0  0   0  51  10  0  0  0  0   0
E   1  0   0   3  78  0  0  0  0   5
F   1  0   0   0   0  0  0  0  0   3
G   4  0   0   0   2  0  5  0  0   4
H   0  0   1   1   3  0  0  2  0   1
I   4  0   0   0   1  0  0  0  0   0
J  10  0   0   5  15  0  0  0  0  24

Reading a confusion matrix goes as the following: rows are actual labels, columns are predicted labels. A perfect model would have a diagonal confusion matrix, as it would correctly predict all the time! Read more on confusion matrices.

Here, you can read that your model is sometimes wrong. It predicted A 10 times when the answer was actually J... But it's particularly good for category G: on the five times it was predicted, it was always right!

Category accuracy

A category accuracy is obtained when counting how many times you predicted it well, among all the times you predicted it:

>>> cm["A"]["A"] / cm.sum(axis=0)["A"]                                                                                                               
0.6307692307692307

>>> cm["D"]["D"] / cm.sum(axis=0)["D"]                                                                                                               
0.8225806451612904

>>> cm["E"]["E"] / cm.sum(axis=0)["E"]                                                                                                               
0.6782608695652174

TP, TN, FP, FN for each

These measures usually make sense in a binary classification setup, yet for a given category, you can imagine being in a one-vs-all (considered category vs all the rest) setup, which looks like binary, hence calculate these measures.

Taking advantage of this answer, you can get all TP, TN, FP, FN values for each category using the following:

FP = cm.sum(axis=0) - np.diag(cm)   
FN = cm.sum(axis=1) - np.diag(cm) 
TP = pd.Series(np.diag(cm), index=list("ABCDEFGHIJ"))
TN = np.matrix(cm).sum() - (FP + FN + TP)  

Now, FP for category A is:

>>> FP["A"]
24  #  you can verify, it's the sum of all values except diagonal element

Same logic applies for all other measures.

Bangbangbang
  • 560
  • 2
  • 12
arnaud
  • 3,293
  • 1
  • 10
  • 27
0

To add to the other answer, true positive and false positive and the other metrics only make sense in the context of binomial responses. This Wikipedia page outlines this in a little more detail:

Precision and Recall

In the case above, you can't necessarily calculate an overall TP or FP rate, but you could calculate False 'A' and True 'A' and so on, as discussed in the answer above.

lincolnck
  • 302
  • 1
  • 12