20

How do I calculate accuracy, precision and recall for each class from a confusion matrix? I am using the embedded dataset iris; the confusion matrix is as below:

prediction   setosa versicolor virginica
setosa         29          0         0
versicolor      0         20         2
virginica       0          3        21

I am using 75 entries as the training set and other for testing:

iris.train <- c(sample(1:150, 75)) # have selected 75 randomly
desertnaut
  • 57,590
  • 26
  • 140
  • 166
jack
  • 273
  • 3
  • 4
  • 14

1 Answers1

31

Throughout this answer, mat is the confusion matrix that you describe.

You can calculate and store accuracy with:

(accuracy <- sum(diag(mat)) / sum(mat))
# [1] 0.9333333

Precision for each class (assuming the predictions are on the rows and the true outcomes are on the columns) can be computed with:

(precision <- diag(mat) / rowSums(mat))
#     setosa versicolor  virginica 
#  1.0000000  0.9090909  0.8750000 

If you wanted to grab the precision for a particular class, you could do:

(precision.versicolor <- precision["versicolor"])
# versicolor 
#  0.9090909 

Recall for each class (again assuming the predictions are on the rows and the true outcomes are on the columns) can be calculated with:

recall <- (diag(mat) / colSums(mat))
#     setosa versicolor  virginica 
#  1.0000000  0.8695652  0.9130435 

If you wanted recall for a particular class, you could do something like:

(recall.virginica <- recall["virginica"])
# virginica 
# 0.9130435 

If instead you had the true outcomes as the rows and the predicted outcomes as the columns, then you would flip the precision and recall definitions.

Data:

(mat = as.matrix(read.table(text="  setosa versicolor virginica
 setosa         29          0         0
 versicolor      0         20         2
 virginica       0          3        21", header=T)))
#            setosa versicolor virginica
# setosa         29          0         0
# versicolor      0         20         2
# virginica       0          3        21
josliber
  • 43,891
  • 12
  • 98
  • 133
  • 1
    Is it possible to give an overall F score for such data by applying averages? – mlee_jordan May 06 '16 at 12:49
  • 1
    @mlee_jordan Yes you can. One resource for further investigation is the scikit-learn manual: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics. There may be better, more general resources out there, though. In fact, if you compute the F-score on a multiclass case, it will automatically compute the per class score and average. In other cases, like recall, you have the option to compute micro averages (count all TP, FN, FP and compute the score) or macro averages (compute the score per class and average) when computing the score. – Cerno Feb 17 '17 at 14:51
  • How would you calculate accuracy for a single class? Would it be the same for each class? – spacedustpi Dec 16 '20 at 11:58
  • @spacedustpi Interesting one! That would be `1-(rowSums(mat)+colSums(mat)-2*diag(mat))/sum(mat)`. No, it can be different for every class. For instance, you have perfect accuracy for setosa here (you always got "setosa" vs. "not setosa" correct). However, you're not perfect for the other two classes. – josliber Dec 16 '20 at 18:24
  • 1
    @josliber Using this formula, the accuracy score are the same, 93.3%, for both versicolor and virgenica and a 100% for setosa. Overall accuracy is 93.3%. I can't wrap my head around why this is. Can you explain it? – spacedustpi Dec 20 '20 at 21:01
  • @spacedustpi Overall accuracy measures if you got the class labels exactly correct, while accuracy for a single class just measures if you got the "yes/no" decision for that class correct. So overall accuracy will always be <= the accuracy for a single class. – josliber Dec 21 '20 at 13:47