Scikit-learn, get accuracy scores for each class

Question

Is there a built-in way for getting accuracy scores for each class separatetly? I know in sklearn we can get overall accuracy by using metric.accuracy_score. Is there a way to get the breakdown of accuracy scores for individual classes? Something similar to metrics.classification_report.

from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']

classification_report does not give accuracy scores:

print(classification_report(y_true, y_pred, target_names=target_names, digits=4))

Out[9]:         precision    recall  f1-score   support

class 0     0.5000    1.0000    0.6667         1
class 1     0.0000    0.0000    0.0000         1
class 2     1.0000    0.6667    0.8000         3

avg / total     0.7000    0.6000    0.6133         5

Accuracy score gives only the overall accuracy:

accuracy_score(y_true, y_pred)
Out[10]: 0.59999999999999998

score 33 · Answer 1 · answered Dec 17 '18 at 22:51

33

from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
matrix = confusion_matrix(y_true, y_pred)
matrix.diagonal()/matrix.sum(axis=1)

answered Dec 17 '18 at 22:51

javac

2,819
1
20
22

2

To elaborate, assuming column (hence this answer's "axis=1") represents the actual classes, while the row represents the predicted classes, the accuracy for class i is the ii element of the confusion matrix over the sum of the ith column. The math just works out that way. – Mong H. Ng Mar 02 '19 at 03:00
1

This [link](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology) provides very good explanation of confusion matrix. – Mong H. Ng Mar 02 '19 at 03:01
16

I believe this code might be incorrect. `sklearn.metrics.confusion_matrix` has the row as the actual classes instead. So we should use `axis=0` instead. – Mong H. Ng Mar 02 '19 at 03:04
1

Isn't this equivalent to the precisio? and if you axis=0 you get the recall no? – Federico Gentile Jan 03 '22 at 17:18
I think this is calculating recall and not accuracy. – Janikas Aug 15 '22 at 01:07
2

This is incorrect, as this is computing recall (if `sum(axis=1)`) or precision (if `sum(axis=0)`). The answer by Ophir should be correct (https://stackoverflow.com/a/65673016/777706). – Oriol Nieto Feb 08 '23 at 20:21
I have already obtained recall and precision values using `precision_score` and `recall_score` from `sklearn.metrics`. When I compare them with the answer by javac above, if set to `axis=1` gives recall values and `axis=0` gives precision values. Does anyone have a more accurate solution? The answer by Ophir as pointed out by others, for me gives unrealistically high values given my data. – Maryam Nasseri Apr 15 '23 at 12:30

score 15 · Answer 2 · answered Jun 21 '18 at 20:44

15

You can use sklearn's confusion matrix to get the accuracy

from sklearn.metrics import confusion_matrix
import numpy as np

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']

#Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)
#array([[1, 0, 0],
#   [1, 0, 0],
#   [0, 1, 2]])

#Now the normalize the diagonal entries
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#array([[1.        , 0.        , 0.        ],
#      [1.        , 0.        , 0.        ],
#      [0.        , 0.33333333, 0.66666667]])

#The diagonal entries are the accuracies of each class
cm.diagonal()
#array([1.        , 0.        , 0.66666667])

References

plot Confusion matrix sklearn

answered Jun 21 '18 at 20:44

Gambit1614

8,547
1
25
51

24

This calculates not accuracy, but recall. – strohne Aug 26 '18 at 11:52
nope it is true in docs it states https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html [1] Wikipedia entry for the Confusion matrix (Wikipedia and other references may use a different convention for axes) – canbax Apr 12 '19 at 20:42
2

Hint: For sklearn>=0.22 Use `confusion_matrix(..., normalize="true").diagonal()` to compute the per class accuracies directly. – normanius Feb 08 '21 at 17:17
2

@strohne As if the confusion matrix wasn't confusing enough, don't make it worse :) The above correctly computes the per-class accuracies, that is the ratio of correctly classified samples per class. Recall is the per-class accuracy of the positive class, which should not be confused with the overall accuracy (ratio of correct predictions across all classes). Overall accuracy can be calculated as `confusion_matrix(..., normalize="all").diagonal().sum()`. – normanius Feb 08 '21 at 17:26
9

For multi class classification, per-class accuracy is the same as per-class recall. – Ophir S Apr 07 '21 at 09:00

score 7 · Answer 3 · edited Mar 18 '23 at 01:50

I am adding my answer as I haven't found any answer to this exact question online, and because I think that the other calculation methods suggested here before me are incorrect.

Remember that accuracy is defined as:

accuracy = (true_positives + true_negatives) / all_samples

Or to put it into words; it is the ratio between the number of correctly classified examples (either positive or negative) and the total number of examples in the test set.

One thing that is important to note is that for both TN and FN, the "negative" is class agnostic, meaning "not predicted as the specific class in question". For example, consider the following:

y_true = ['cat', 'dog', 'bird', 'bird']
y_pred = ['cat', 'dog', 'cat', 'dog']

Here, both the second 'cat' prediction and the second 'dog' prediction are false negatives simply because they are not 'bird'.

To your question:

As far as I know, there is currently no package that provides a method that does what you are looking for, but based on the definition of accuracy, we can use the confusion matrix method from sklearn to calculate it ourselves.

from sklearn.metrics import confusion_matrix
import numpy as np

# Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)

# We will store the results in a dictionary for easy access later
per_class_accuracies = {}

# Calculate the accuracy for each one of our classes
for idx, cls in enumerate(classes):
    # True negatives are all the samples that are not our current GT class (not the current row) 
    # and were not predicted as the current class (not the current column)
    true_negatives = np.sum(np.delete(np.delete(cm, idx, axis=0), idx, axis=1))
    
    # True positives are all the samples of our current GT class that were predicted as such
    true_positives = cm[idx, idx]
    
    # The accuracy for the current class is the ratio between correct predictions to all predictions
    per_class_accuracies[cls] = (true_positives + true_negatives) / np.sum(cm)

The original question was posted a while ago, but this might help anyone who comes here through Google, like me.

This answer is one of the few correct ones on this question. You can simplify it by doing this one-liner: `{name: accuracy_score(np.array(y_true) == i, np.array(y_pred) == i) for i, name in enumerate(target_names)}` — Oriol Nieto, Feb 08 '23 at 20:19

score 5 · Answer 4 · answered Sep 29 '16 at 12:49

You can code it by yourself : the accuracy is nothing more than the ratio between the well classified samples (true positives and true negatives) and the total number of samples you have.

Then, for a given class, instead of considering all the samples, you only take into account those of your class.

You can then try this: Let's first define a handy function.

def indices(l, val):
   retval = []
   last = 0
   while val in l[last:]:
           i = l[last:].index(val)
           retval.append(last + i)
           last += i + 1   
   return retval

The function above will return the indices in the list l of a certain value val

def class_accuracy(y_pred, y_true, class):
    index = indices(l, class)
    y_pred, y_true = ypred[index], y_true[index]
    tp = [1 for k in range(len(y_pred)) if y_true[k]==y_pred[k]]
    tp = np.sum(tp)
    return tp/float(len(y_pred))

The last function will return the in-class accuracy that you look for.

I don't know if there is an existing function in numpy that returns the indices of values in a list that match your argument. Does anyone have an idea ? Thanks a lot ! — MMF, Sep 29 '16 at 12:53
second function uses an "l" which is not defined. There's also a type when assinging y_true and y_pred again. (Yet I've upvoted it since it is the only correct answer here ) — Tommaso Guerrini, May 15 '20 at 11:26

score 3 · Answer 5 · answered Jul 05 '19 at 07:12

3

Your question makes no sense. Accuracy is a global measure, and there is no such thing as class-wise accuracy. The suggestions to normalize by true cases (rows) yields something called true-positive rate, sensitivity or recall, depending on the context. Likewise, if you normalize by prediction (columns), it's called precision or positive predictive value.

answered Jul 05 '19 at 07:12

user11130854

333
2
9

2

Yes, I don't think there is a measure called "per-class" accuracy. Because mathematically it doesn't makes sense. Accuracy is defined as `(TP + TN) / total_population`, the problem is how to calculate "total population of a specific class" – tjysdsg Mar 27 '21 at 12:48

score 2 · Answer 6 · answered Aug 26 '18 at 11:51

The question is misleading. Accuracy scores for each class equal the overall accuracy score. Consider the confusion matrix:

from sklearn.metrics import confusion_matrix
import numpy as np

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]

#Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)
print(cm)

This gives you:

 [[1 0 0]
  [1 0 0]
  [0 1 2]]

Accuracy is calculated as the proportion of correctly classified samples to all samples:

accuracy = (TP + TN) / (P + N)

Regarding the confusion matrix, the numerator (TP + TN) is the sum of the diagonal. The denominator is the sum of all cells. Both are the same for every class.

score 2 · Answer 7 · answered Oct 20 '20 at 09:39

2

In my opinion, accuracy is generic term that has different dimensions, e.g. precision, recall, f1-score, (or even specificity, sensitivity), etc. that provide accuracy measures in different perspectives. Hence, the function 'classification_report' outputs a range of accuracy measures for each class. For instance, precision provides the proportion of accurately retrieved instances (i.e. true positives) with total number of instances (both true positives and false negatives) available in a particular class.

answered Oct 20 '20 at 09:39

Md Abdul Bari

51
4

Hi Md Abdul Bari, welcome to Stack Overflow! I fully agree with your reply. Even though it doesn't answer the question CentAu asked, one cannot stress enough that there are many accuracy metrics with different meanings. – Esteis Oct 20 '20 at 10:37
I strongly disagree. Accuracy is very much standardized terminology. Sure you can loosely talk about accuracy in different contexts, but for classification, it has a very specific, well-defined meaning. – user11130854 Jan 20 '22 at 16:30

ANKIT YEMBEWAR · Answer 8 · 2021-01-14T04:22:09.307

0

here the solution bro:

def classwise_accuracy():
   a = pd.crosstab(y_test,predict_over)
   print(a.max(axis=1)/a.sum(axis=1))
classwise_accuracy()

edited Jan 14 '21 at 04:22

answered Jan 13 '21 at 03:39

ANKIT YEMBEWAR

9
3

Hi Ankit, better if you could add an explanation to this answer – Hashan Mahesh Jul 05 '22 at 14:48
For a multi-class model, this function gives me recall scores that I have already obtained using `sklearn.metrics.recall_score`. Do you have any idea why? – Maryam Nasseri Apr 15 '23 at 15:04

score 0 · Answer 9 · answered Jun 22 '22 at 04:09

For the multilabel case you can use this

from sklearn.metrics import multilabel_confusion_matrix

def get_accuracies(true_labels, predictions):
    #https://scikit-learn.org/stable/modules/generated/sklearn.metrics.multilabel_confusion_matrix.html
    cm = multilabel_confusion_matrix(true_labels, predictions)
    total_count = true_labels.shape[0]
    accuracies = []
    for i in range(true_labels.shape[1]):
        true_positive_count = np.sum(cm[i,1,1]).item()
        true_negative_count = np.sum(cm[i,0,0]).item()
        accuracy = (true_positive_count + true_negative_count) / total_count
        accuracies.append(accuracy)
    return accuracies

score 0 · Answer 10 · edited Sep 29 '22 at 20:32

No, There is no built-in way for getting accuracy scores for each class separately. But you can use the following snippet to get accuracy, sensitivity, and specificity.

def class_matric(confusion_matrix, class_id):
    """
    confusion matrix of multi-class classification
    
    class_id: id of a particular class 
    
    """
    confusion_matrix = np.float64(confusion_matrix)
    TP = confusion_matrix[class_id,class_id]
    FN = np.sum(confusion_matrix[class_id]) - TP
    FP = np.sum(confusion_matrix[:,class_id]) - TP
    TN = np.sum(confusion_matrix) - TP - FN - FP
    
    # sensitivity = 0 if TP == 0
    if TP != 0:
        sensitivity = TP/(TP+FN)
    else:
        sensitivity = 0.
    
    specificity = TN/(TN+FP)
    accuracy = (TP+TN)/(TP+FP+FN+TN)
    
    return sensitivity, specificity, accuracy

score 0 · Answer 11 · answered Nov 13 '22 at 05:12

You can use sklearn.metrics.classification_report:

from sklearn.metrics import classification_report
# Your code
#.
#.
#.
print(classification_report(y_true, y_pred))

This will show precision, recall and F1 score for each class.

Precision is defined as the number of true positives over the number of true positives plus the number of false positives.
Recall is defined as the number of true positives over the number of true positives plus the number of false negatives.
F1 score is defined as the harmonic mean of precision and recall, simply the more is better.

You can also checkout these links from the official documentation:

Scikit-learn, get accuracy scores for each class

11 Answers11

Linked