1

so i made a naive Bayes model in my machine learning, i build from 2 data frame, df as training set, df1 as test set. with X_train is interest_rate column in df, Y_train is all of the column except interest_rate column. and X_test is all the column in df1. so i want to find out the Y_test, with this code

    from sklearn.naive_bayes import GaussianNB
X_train = df.drop(columns = ['Interest_Rate'])
Y_train = df['Interest_Rate']
X_test = df1.drop(columns = ['Interest_Rate'])

gnb = GaussianNB()
gnb.fit(X_train,Y_train)
y_pred = gnb.predict(X_test)

but when i run it, it return

 ValueError                                Traceback (most recent call last)
<ipython-input-88-46a161c5db08> in <module>()
      5 
      6 gnb = GaussianNB()
----> 7 gnb.fit(X_train,Y_train)
      8 y_pred = gnb.predict(X_test)

6 frames
/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

ValueError: could not convert string to float: '7,000'

where my wrong?

18818181881
  • 105
  • 7
  • Can you check [this](https://stackoverflow.com/questions/37439933/pandas-reading-csv-data-formatted-with-comma-for-thousands-separator) ? – jezrael Apr 26 '21 at 09:03
  • now the error return ```could not convert string to float: '< 1 year'``` – 18818181881 Apr 26 '21 at 09:08
  • always first you should see data (ie. using `print()`) to check what you have to preprocess before you use data in machine learning. Your error shows that you have strings like `'7,000'` which needs dot instead of comma, and you have strings `'< 1 year'` which you should replace with some float/integer value. And this is your mistake - you assumend that all data are correct. – furas Apr 26 '21 at 09:44
  • the problem is that value is just encoding by me, so i was wondering why that data is still appear? – 18818181881 Apr 26 '21 at 09:58

0 Answers0