0

I have a credit scoring dataset, need to classify whether the customer will default or not.

LIMIT_BAL  gender EDUCATION MARRIAGE    AGE SEP_STATUS  AUG_STATUS  JUL_STATUS  JUN_STATUS  MAY_STATUS  ... JUN_BAL MAY_BAL APR_BAL SEP_PAID    AUG_PAID    JUL_PAID    JUN_PAID    MAY_PAID    APR_PAID    default_0
0   20000   female  bachelor    married 24  2 mo    2 mo    paid    paid    no need to pay  ... 0   0   0   0   689 0   0   0   0   bad
1   90000   female  bachelor    single  34  using credit    using credit    using credit    using credit    using credit    ... 14331   14948   15549   1518    1500    1000    1000    1000    5000    good

dec_class= DecisionTreeClassifier(random_state=17)
y = df['default_0']
x = df.iloc[:, :-1]

X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=17)

dec_class.fit(x,y)
could not convert string to float: 'female'

I thought decision tree works equally well with categorical and numerical features. I have preprocessed categorical features to words, they were all numerical before. Why is not accepting same categorical features as words: gender - 'male', 'female'?

Erfan
  • 40,971
  • 8
  • 66
  • 78
kaban
  • 423
  • 1
  • 5
  • 10
  • 2
    Does this answer your question? [Passing categorical data to Sklearn Decision Tree](https://stackoverflow.com/questions/38108832/passing-categorical-data-to-sklearn-decision-tree) – fuglede Nov 01 '19 at 10:30
  • You cannot insert string into a decisiontree. You have to encode it. – Erfan Nov 01 '19 at 10:31
  • @fuglede, this defintely helpful ,thx, but i will read it again and decide.. – kaban Nov 01 '19 at 10:34

0 Answers0