I'm trying to generate some line graph with an x and y axis demonstrating accuracy of 2 different algorithms running a classification - Naive Bayes and SVM.
I train/test the data like this:
# split the dataset into training and validation datasets
train_x, valid_x, train_y, valid_y = model_selection.train_test_split(result['post'], result['type'], test_size=0.30, random_state=1)
# label encode the target variable
encoder = preprocessing.LabelEncoder()
train_y = encoder.fit_transform(train_y)
valid_y = encoder.fit_transform(valid_y)
def tokenizersplit(str):
return str.split()
tfidf_vect = TfidfVectorizer(tokenizer=tokenizersplit, encoding='utf-8', min_df=2, ngram_range=(1, 2), max_features=25000)
tfidf_vect.fit(result['post'])
tfidf_vect.transform(result['post'])
xtrain_tfidf = tfidf_vect.transform(train_x)
xvalid_tfidf = tfidf_vect.transform(valid_x)
def train_model(classifier, trains, t_labels, valids, v_labels):
# fit the training dataset on the classifier
classifier.fit(trains, t_labels)
# predict the labels on validation dataset
predictions = classifier.predict(valids)
return metrics.accuracy_score(predictions, v_labels)
# Naive Bayes
accuracy = train_model(naive_bayes.MultinomialNB(), xtrain_tfidf, train_y, xvalid_tfidf, valid_y)
print ("NB accuracy: ", accuracy)
However for an assignment I need something plotted on the x/y axis using matplotlib. I tried this:
m=linear_model.LogisticRegression()
m.fit(xtrain_tfidf, train_y)
y_pred = m.predict(xvalid_tfidf)
print(metrics.classification_report(valid_y, y_pred))
plt.plot(valid_y, y_pred)
plt.show()
But this gives me:
I need something that can more easily compare the accuracy of Naive Bayes vs SVM vs another algorithm. How can I do this? Plotting classification report:
plt.plot(metrics.classification_report(valid_y, y_pred))
plt.show()
My classification output:
precision recall f1-score support
0 1.00 0.18 0.31 11
1 0.00 0.00 0.00 14
2 0.00 0.00 0.00 19
3 0.50 0.77 0.61 66
4 0.39 0.64 0.49 47
5 0.00 0.00 0.00 23
accuracy 0.46 180
macro avg 0.32 0.27 0.23 180
weighted avg 0.35 0.46 0.37 180
Error w edit:
df = pd.DataFrame(metrics.classification_report(valid_y, y_pred)).transpose()
gives error
ValueError: DataFrame constructor not properly called!