How to calculate the accuracy?

Question

I'm trying to calculate the accuracy for a twitter sentiment analysis project. However, I get this error, and I was wondering if anyone could help me calculate the accuracy? Thanks

Error: ValueError: Classification metrics can't handle a mix of continuous and multiclass targets

My code:

import re
import pickle
import numpy as np
import pandas as pd


from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv("updated_tweet_info.csv")
data =  df.fillna(' ')

train,test = train_test_split(data, test_size = 0.2, random_state = 42)

train_clean_tweet=[]
for tweet in train['tweet']:
    train_clean_tweet.append(tweet)
test_clean_tweet=[]
for tweet in test['tweet']:
    test_clean_tweet.append(tweet)

v = CountVectorizer(analyzer = "word")
train_features= v.fit_transform(train_clean_tweet)
test_features=v.transform(test_clean_tweet)


lr = RandomForestRegressor(n_estimators=200)
fit1 = lr.fit(train_features, train['clean_polarity'])
pred = fit1.predict(test_features)
accuracy = accuracy_score(pred, test['clean_polarity'])`

Veguinho · Answer 1 · 2020-10-04T23:00:48.733

0

You are trying to use the accuracy_score method, but accuracy is a classification metric.

In your case, try using a regression metric method like: mean_squared_error() and then applying np.sqrt(). This will return you the Root Mean Squared Error. The lower the number, the better. You can also look here for more details.

Try this:

 import numpy as np
 rmse = np.sqrt(mean_squared_error(test['clean_polarity'], pred))

This guy also had the same problem

edited Oct 04 '20 at 23:00

answered Oct 04 '20 at 22:39

Veguinho

1
3

Ok, so what is considered low? – user123456789 Oct 05 '20 at 00:55
It depends on the type of data. In this case you are encoding the number of times a word is used, by applying CountVectorizer(). This means that it will probably try to estimate some kind of number to indicate the sentiment, according to the number of times a word was used in a sentence. If this number used to indicate sentiment goes from 0 to 100, then a RMSE=0.1 is low, but a RMSE=10 would be a lot. But if this number has a bigger scale, for example from 0 to 1,000,00, then a RMSE=10 would become small, compared to the range of the sentiment value. – Veguinho Oct 06 '20 at 01:21

How to calculate the accuracy?

1 Answers1