I have a credit scoring dataset, need to classify whether the customer will default or not.
LIMIT_BAL gender EDUCATION MARRIAGE AGE SEP_STATUS AUG_STATUS JUL_STATUS JUN_STATUS MAY_STATUS ... JUN_BAL MAY_BAL APR_BAL SEP_PAID AUG_PAID JUL_PAID JUN_PAID MAY_PAID APR_PAID default_0
0 20000 female bachelor married 24 2 mo 2 mo paid paid no need to pay ... 0 0 0 0 689 0 0 0 0 bad
1 90000 female bachelor single 34 using credit using credit using credit using credit using credit ... 14331 14948 15549 1518 1500 1000 1000 1000 5000 good
dec_class= DecisionTreeClassifier(random_state=17)
y = df['default_0']
x = df.iloc[:, :-1]
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=17)
dec_class.fit(x,y)
could not convert string to float: 'female'
I thought decision tree works equally well with categorical and numerical features. I have preprocessed categorical features to words, they were all numerical before. Why is not accepting same categorical features as words: gender - 'male', 'female'?