What score metric is used when using joblib to store a model?

Question

I have used joblib.dump to store a machine learning model (21 classes). When I call the model and test it with a hold-out set I get a value which I do not know what metric it is (accuracy, precision, recall, etc)?!! 0.952380952381

So I computed the confusion matrix and the FP, FN, TN, TP. I used the information from this Link
I also found some code from a Github.

I compared both results (1 and 2). Both give the same value for Accuracy=0.995464852608. But this result is different from the above one!!!

Any ideas? Did I computed correctly TP, FP, TN, FN?

MY CONFUSION MATRIX

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]]

MY CODE

 #Testing with the holdout set
 print(loaded_model.score(x_oos, y_oos))
 0.952380952381  <------IS IT ACCURACY?


 #Calculating the Confusion matrix
 cm = confusion_matrix(y_oos, y_oos_pred)
 cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] 

#Calculating values according to link 2.
FP = cm.sum(axis=0) - np.diag(cm)  
FN = cm.sum(axis=1) - np.diag(cm)  
TP = np.diag(cm)
TN = (21 - (FP + FN + TP))  #I put 21 because I have 21 classes


# Overall accuracy
ACC = np.mean((TP+TN)/(TP+FP+FN+TN))


print(ACC)
0.995464852608   <----IT IS DIFFERENT FROM THE ABOVE ONE.

Which type of model is it? For classifiers, its `accuracy_score` and for regressors its mostly `r2_score`, but may differ for some. Find out the model class and look at the `score()` function in its documentation, you will get your answer. — Vivek Kumar, Apr 25 '17 at 01:13
@VivekKumar Hi. It is for classification. That is why I thought it was accuracy score. But when I calculated the accuracy score from the confusion matrix; the value is different. And i started wondering what is was. — Aizzaac, Apr 25 '17 at 15:14
Is this a binary classification or multiclass classification. You should post here what model you are using, then I may be able to tell more about it. Also post some code snippet on how to calculate `y_oos_pred`. — Vivek Kumar, Apr 25 '17 at 16:26

Grr · Answer 1 · 2017-04-25T17:19:19.687

0

Your example is a little bit confusing. If you provide some numbers it would be easier to understand and answer. For example just printing cm would be very helpful.

That being said. The way to deconstruct a sklearn.metrics.confusion_matris is as follows (for a binary classification):

true_neg, false_pos, false_neg, false_pos = confusion_matrix(y_oos, y_oos_pred).ravel()

For multiple classes I think the result is closer to what you have, but with the values summed. Like so:

trues = np.diag(cm).sum()
falses = (cm.sum(0) - np.diag(cm)).sum()

Then you can just compute the accuracy with:

ACC = trues / (trues + falses)

** Update**

From your edited question I can now see that in your confusion matrix you have 21 total samples of which 20 where correctly classified. In that case your accuracy is:

$\frac{20}{21} = 0.95238$

This is the value printed by the model_score method. So you are measuring accuracy. You just aren't reproducing it correctly.

n.b sorry for the latex, but hopefully one day StackOverflow will implement it.

edited Apr 25 '17 at 17:19

answered Apr 24 '17 at 22:21

Grr

15,553
7
65
85

are you sure of the "trues" and "falses" for a multiclass problem? I used this link to calculate them: http://stackoverflow.com/questions/31324218/scikit-learn-how-to-obtain-true-positive-true-negative-false-positive-and-fal – Aizzaac Apr 25 '17 at 16:51
So in my experience you don't calculate TP, FP, TN, and FN for all of the classes, but you can for each individual class. Take a look at [How do you calculate precision and recall for multiclass classification](https://stats.stackexchange.com/questions/51296/how-do-you-calculate-precision-and-recall-for-multiclass-classification-using-co/51301#51301) and [Precision and recall in a multiclass calssification system](https://stats.stackexchange.com/questions/48036/precision-and-recall-in-a-multi-class-classification-system) for more info. – Grr Apr 25 '17 at 17:14
Based on your edit to your question you are calculating `ACC` incorrectly. – Grr Apr 25 '17 at 17:15
The first ACC: 0.95238 is for all classes. The second ACC:0.99546 is the average of ACCs from each class. – Aizzaac Apr 25 '17 at 18:12

score 0 · Accepted Answer · answered Apr 25 '17 at 18:29

0

Both are Accuracy.

The first one is the overall accuracy: All_True_Positives/All_classes (20/21).

The second one is the average of accuracies from each class. So we add all these values and divide by 21. [0.9524 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.9524 1 1 1]

answered Apr 25 '17 at 18:29

Aizzaac

3,146
8
29
61

I understand the logic, but I disagree with this as a relevant measure of accuracy for the model. It's almost like bootstrapping accuracy to make your model seem more accurate than it really is. In essence you are recounting your positive results 20x over but only ever accounting for your negatives once. Perhaps this is an accepted measure, but it feels statistically disingenuous. – Grr Apr 25 '17 at 18:48
I will continue with the one given by scikit (overall accuracy). I did not know it was possible to compute accuracy in 2 ways. I first thought it was recall. – Aizzaac Apr 25 '17 at 19:10

What score metric is used when using joblib to store a model?

2 Answers2