Sklearn - plotting classification report gives a different output than basic avg?

Question

I wanted to leverage this answer How to plot scikit learn classification report? turning an sklearn classification report into a heatmap.

It's all working with their sample report, however my classification report looks slightly different and is thus screwing up the functions.

Their report (notice the avg / total):

sampleClassificationReport =             
                   precision    recall  f1-score   support

          Acacia        0.62      1.00      0.76        66
          Blossom       0.93      0.93      0.93        40
          Camellia      0.59      0.97      0.73        67
          Daisy         0.47      0.92      0.62       272
          Echium        1.00      0.16      0.28       413

        avg / total     0.77      0.57      0.49       858

My report with metrics.classification_report(valid_y, y_pred) :

              precision    recall  f1-score   support

           0       1.00      0.18      0.31        11
           1       0.00      0.00      0.00        14
           2       0.00      0.00      0.00        19
           3       0.50      0.77      0.61        66
           4       0.39      0.64      0.49        47
           5       0.00      0.00      0.00        23

    accuracy                           0.46       180
   macro avg       0.32      0.27      0.23       180
weighted avg       0.35      0.46      0.37       180

The issue, from the selected answer in the heatmap link, is here:

for line in lines[2 : (len(lines) - 2)]:
    t = line.strip().split()
    if len(t) < 2: continue
    classes.append(t[0])
    v = [float(x) for x in t[1: len(t) - 1]]
    support.append(int(t[-1]))
    class_names.append(t[0])
    print(v)
    plotMat.append(v)

Because I get the error:

ValueError: could not convert string to float: 'avg'

So the problem truly is how my classification report is being outputted. What can I change here to match the sample?

EDIT: what Ive tried:

df = pd.DataFrame(metrics.classification_report(valid_y, y_pred)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])

Error:

ValueError: DataFrame constructor not properly called!

The folks that answered that thread ~ 5 yrs ago obviously went to some painstakingly detailed parsing of the report output; but, as you can guess by comparing the two reports, scikit-learn has since then changed some of the report details, and that's why you cannot just plug their answer here. — desertnaut, May 10 '20 at 00:04
@desertnaut right. thanks for following my struggling today lol.. to get their heat map to work, what would you suggest? is there a way to tweak their answer or must I find a new way to plot the classification metric? — blue, May 10 '20 at 00:08
There is probably a way, and it shouldn't be difficult - after all, the two classification reports are obviously not *that* different; but it really is not the kind of problem I'm good at - sorry :/ — desertnaut, May 10 '20 at 00:10

Venkatachalam · Answer 1 · 2020-05-11T00:15:06.830

7

With the advent of output_dict param in classification_report, there is no hassle for parsing the report. You can directly use the output of classification report to be read as pd.DataFrame. Then, you could use the pd.Style option to render the heat map.

Example:

from sklearn.metrics import classification_report
import numpy as np
import pandas as pd

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV


X, y = make_classification(n_samples=1000, n_features=30,
                           n_informative=12,
                           n_clusters_per_class=1, n_classes=10,
                           class_sep=2.0, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y)


clf = LogisticRegression(max_iter=1000, random_state=42).fit(X_train, y_train)



df = pd.DataFrame(classification_report(clf.predict(X_test), 
                                        y_test, digits=2,
                                        output_dict=True)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])

edited May 11 '20 at 00:15

answered May 10 '20 at 07:37

Venkatachalam

16,288
9
49
77

1

Great catch indeed ;) – desertnaut May 10 '20 at 12:11
1

Thank you.. but I think Im missing something. Copied your example and ran it, and compiled with no errors aside from having to correct LogisticsRegression to LogisticsRegressionCV, but the visuals dont show up? Where should they show? – blue May 10 '20 at 22:40
See my edit.. plugging in my own classification report I get an error – blue May 10 '20 at 22:43
1

U seems to have missed the 'output_dict' param. Can u try with that – Venkatachalam May 11 '20 at 00:14
Btw, can u try my example in jupyter notebook? – Venkatachalam May 11 '20 at 00:16

Sklearn - plotting classification report gives a different output than basic avg?

1 Answers1

Linked