5

I wanted to leverage this answer How to plot scikit learn classification report? turning an sklearn classification report into a heatmap.

It's all working with their sample report, however my classification report looks slightly different and is thus screwing up the functions.

Their report (notice the avg / total):

sampleClassificationReport =             
                   precision    recall  f1-score   support

          Acacia        0.62      1.00      0.76        66
          Blossom       0.93      0.93      0.93        40
          Camellia      0.59      0.97      0.73        67
          Daisy         0.47      0.92      0.62       272
          Echium        1.00      0.16      0.28       413

        avg / total     0.77      0.57      0.49       858

My report with metrics.classification_report(valid_y, y_pred) :

              precision    recall  f1-score   support

           0       1.00      0.18      0.31        11
           1       0.00      0.00      0.00        14
           2       0.00      0.00      0.00        19
           3       0.50      0.77      0.61        66
           4       0.39      0.64      0.49        47
           5       0.00      0.00      0.00        23

    accuracy                           0.46       180
   macro avg       0.32      0.27      0.23       180
weighted avg       0.35      0.46      0.37       180

The issue, from the selected answer in the heatmap link, is here:

for line in lines[2 : (len(lines) - 2)]:
    t = line.strip().split()
    if len(t) < 2: continue
    classes.append(t[0])
    v = [float(x) for x in t[1: len(t) - 1]]
    support.append(int(t[-1]))
    class_names.append(t[0])
    print(v)
    plotMat.append(v)

Because I get the error:

ValueError: could not convert string to float: 'avg'

So the problem truly is how my classification report is being outputted. What can I change here to match the sample?

EDIT: what Ive tried:

df = pd.DataFrame(metrics.classification_report(valid_y, y_pred)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])

Error:

ValueError: DataFrame constructor not properly called!

blue
  • 7,175
  • 16
  • 81
  • 179
  • The folks that answered that thread ~ 5 yrs ago obviously went to some painstakingly detailed parsing of the report output; but, as you can guess by comparing the two reports, scikit-learn has since then changed some of the report details, and that's why you cannot just plug their answer here. – desertnaut May 10 '20 at 00:04
  • @desertnaut right. thanks for following my struggling today lol.. to get their heat map to work, what would you suggest? is there a way to tweak their answer or must I find a new way to plot the classification metric? – blue May 10 '20 at 00:08
  • There is probably a way, and it shouldn't be difficult - after all, the two classification reports are obviously not *that* different; but it really is not the kind of problem I'm good at - sorry :/ – desertnaut May 10 '20 at 00:10

1 Answers1

7

With the advent of output_dict param in classification_report, there is no hassle for parsing the report. You can directly use the output of classification report to be read as pd.DataFrame. Then, you could use the pd.Style option to render the heat map.

Example:

from sklearn.metrics import classification_report
import numpy as np
import pandas as pd

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV


X, y = make_classification(n_samples=1000, n_features=30,
                           n_informative=12,
                           n_clusters_per_class=1, n_classes=10,
                           class_sep=2.0, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y)


clf = LogisticRegression(max_iter=1000, random_state=42).fit(X_train, y_train)



df = pd.DataFrame(classification_report(clf.predict(X_test), 
                                        y_test, digits=2,
                                        output_dict=True)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77