0

I have a problem with my project regarding naive Bayes which is found in initiating data into variables based on categories so that they are separated based on data that has been filled with labeling. when the tf idf scoring is then divided there are several parts of the variable that cannot run properly. and resulted in a ValueError: y should be a 1d array, got an array of shape (402, 9) instead. occur.

spreadsheet dataset preprocessed

df = pd.read_csv('dataset_preprocessed.csv')
df
sentimen = df[df.Sentimen.apply(lambda x: x !="Sentimen")]

tfidf

from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
text_tf = tf.fit_transform(df['Tweet_Stemmed'].astype('U'))
print(text_tf)
import pandas as pd
df = pd.DataFrame(text_tf.todense().T,
                  index=tf.get_feature_names(),
                  columns=[f'D{i+1}' for i in range(len(df['Tweet_Stemmed']))])
df.head(20)

Train Test Split

#Train Test Split
from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test  = train_test_split(text_tf, sentimen, test_size=0.7, random_state=3000)

Apply Predict Sentiment Naive Bayes

from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
clf = MultinomialNB()
clf.fit(X_train, y_train)
predicted = clf.predict(X_test)

print("MultinomialNB Accuracy:", accuracy_score(y_test,predicted))

ERROR

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-118-90624fc1891c> in <module>
      4 from sklearn.metrics import confusion_matrix
      5 clf = MultinomialNB()
----> 6 clf.fit(X_train, y_train)
      7 predicted = clf.predict(X_test)
      8 

5 frames
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
   1037 
   1038     raise ValueError(
-> 1039         "y should be a 1d array, got an array of shape {} instead.".format(shape)
   1040     )
   1041 

ValueError: y should be a 1d array, got an array of shape (402, 9) instead.

I have tried changing the sizes in test_size and random_state. then think about how to call some variables from the sentiment column and df instead, what is unfortunate is that it produces scoring data and there is no table of sentiment names and tweet_stemmed .

0 Answers0