I have a set of 48 feature columns and one binary classification target. While working with classification problem, i am able to load all the algorithms such as Linear,logistic,knn, random forest and boosting classifiers after having categorical to numerical transformation using one-hot encoding or similar. But, without any transformation from categorical to numerical while running algorithms like Random forest and Decision tree i am facing error as " ValueError: could not convert string to float ... "
I am trying for a base model without any changes, please guide.
print(type(X)) ---> <class 'pandas.core.frame.DataFrame'>
print(type(y)) ---- > <class 'pandas.core.series.Series'>
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
X_train_rf, X_test_rf, y_train_rf, y_test_rf = train_test_split(X,y,random_state=0)
randomforest = RandomForestClassifier()
randomforest.fit(X_train_rf, y_train_rf)
y_train_pred_rf=randomforest.predict(X_train_rf)
y_pred_rf= randomforest.predict(X_test_rf)
print('training accuracy',accuracy_score(y_train_rf,y_train_pred_rf))
print('test accuracy',accuracy_score(y_test_rf,y_pred_rf))
# The o/p obtained is :
ValueError: could not convert string to float: 'Delhi' (# Delhi- the element in an feature column )