1

I'm trying to generate some line graph with an x and y axis demonstrating accuracy of 2 different algorithms running a classification - Naive Bayes and SVM.

I train/test the data like this:

# split the dataset into training and validation datasets
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(result['post'], result['type'], test_size=0.30, random_state=1)

# label encode the target variable
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)

def tokenizersplit(str):
    return str.split()
tfidf_vect = TfidfVectorizer(tokenizer=tokenizersplit, encoding='utf-8', min_df=2, ngram_range=(1, 2), max_features=25000)

tfidf_vect.fit(result['post'])
tfidf_vect.transform(result['post'])

xtrain_tfidf = tfidf_vect.transform(train_x)
xvalid_tfidf = tfidf_vect.transform(valid_x)

def train_model(classifier, trains, t_labels, valids, v_labels):
    # fit the training dataset on the classifier
    classifier.fit(trains, t_labels)

    # predict the labels on validation dataset
    predictions = classifier.predict(valids)

    return metrics.accuracy_score(predictions, v_labels)

# Naive Bayes
accuracy = train_model(naive_bayes.MultinomialNB(), xtrain_tfidf, train_y, xvalid_tfidf, valid_y)
print ("NB accuracy: ", accuracy)

However for an assignment I need something plotted on the x/y axis using matplotlib. I tried this:

m=linear_model.LogisticRegression()
m.fit(xtrain_tfidf, train_y)
y_pred = m.predict(xvalid_tfidf)
print(metrics.classification_report(valid_y, y_pred))
plt.plot(valid_y, y_pred)
plt.show()

But this gives me:

enter image description here

I need something that can more easily compare the accuracy of Naive Bayes vs SVM vs another algorithm. How can I do this? Plotting classification report:

plt.plot(metrics.classification_report(valid_y, y_pred))
plt.show()

enter image description here

My classification output:

  precision    recall  f1-score   support

           0       1.00      0.18      0.31        11
           1       0.00      0.00      0.00        14
           2       0.00      0.00      0.00        19
           3       0.50      0.77      0.61        66
           4       0.39      0.64      0.49        47
           5       0.00      0.00      0.00        23

    accuracy                           0.46       180
   macro avg       0.32      0.27      0.23       180
weighted avg       0.35      0.46      0.37       180

Error w edit:

df = pd.DataFrame(metrics.classification_report(valid_y, y_pred)).transpose()

gives error

ValueError: DataFrame constructor not properly called!

blue
  • 7,175
  • 16
  • 81
  • 179
  • Classification report is a table, and it is not meant to be plotted - try to run it first simply as `classification_report(valid_y, y_pred)` to see what it returns. – desertnaut May 09 '20 at 23:17
  • @desertnaut right. Dont necessarily need to use classification report here, however if I try to plot just that, I get nothing (see image in my update) – blue May 09 '20 at 23:24
  • Can you post the output of `metrics.classification_report(valid_y, y_pred)`. If it's a table, you can just scatterplot both of the axes by passing `plt.scatter(x=..,y=..,..)` – Hirak Sarkar May 10 '20 at 00:10
  • @HirakSarkar yes - see my edit its a table – blue May 10 '20 at 00:25
  • can you also print `metrics.classification_report(valid_y, y_pred).shape` ? The table seems to be truncated, it seems there are more than 4 columns, may be take it in a variable. ``` df = metrics.classification_report(valid_y, y_pred) print(df.shape) print(df.columns) ``` – Hirak Sarkar May 10 '20 at 00:41
  • @HirakSarkar I get the error - AttributeError: 'str' object has no attribute 'shape' – blue May 10 '20 at 00:43
  • That means it's not a table, it's just a long string and not meant for plotting. Although you can use https://stackoverflow.com/questions/28200786/how-to-plot-scikit-learn-classification-report to get a heatmap. Hope this helps. PS: https://stackoverflow.com/a/34304414/4005668 this answer in specific. – Hirak Sarkar May 10 '20 at 00:47
  • @HirakSarkar ok, great. How can I see the non truncated string? or values past the first 5? – blue May 10 '20 at 00:49
  • I expanded the comment in form of an answer. Please check. – Hirak Sarkar May 10 '20 at 00:53

1 Answers1

1

metrics.classification_report summarizes the prediction result. So this is not meant for plotting and just for printing a "report". If you want the table in a visual format you can follow https://stackoverflow.com/a/34304414/4005668.

Otherwise you can get the dataframe by capturing it in a dataframe

import pandas as pd
# put it in a dataframe
df = pd.DataFrame(metrics.classification_report(..)).transpose()
# plot the dataframe
df.plot()
Hirak Sarkar
  • 125
  • 8
  • Right, however that answer is outdated and led to my secondary question here https://stackoverflow.com/questions/61705257/sklearn-plotting-classification-report-gives-a-different-output-than-basic-avg/61708658#61708658 – blue May 10 '20 at 22:50
  • Im getting an error plugging my own classification report into the dataframe.. see my edit.. – blue May 10 '20 at 22:50
  • Can you upload your a jupyter notebook or something so that I can reproduce the error? Because with my toyish examples it's not giving error. – Hirak Sarkar May 14 '20 at 14:43