0

I'm trying to train a model in R using both categorical and numeric data to predict whether a customer purchased something, and when I plot the tree to look at the splits it completely ignored gender.

As seen below, I encoded the gender variables to be just 1 and 2. There's roughly an even split between both males and females. I didn't scale any features.

head(df1)
  Gender Age EstimatedSalary Purchased
1      2  19              19         0
2      2  35              20         0
3      1  26              43         0
4      1  27              57         0
5      2  19              76         0
6      2  27              58         0

I can provide this link showing the decision tree.

Is gender simply not significant for this prediction, or am I missing something else?

altec
  • 135
  • 1
  • 10
  • 3
    Yes, it does. You should set your categorical variables as factors using `?factor` – Brandon Bertelsen Oct 04 '18 at 00:07
  • @BrandonBertelsen Yeah I'd already done that – altec Oct 04 '18 at 08:33
  • Without a reprex, it's hard to see that from what you've posted. See: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example . It's not enough to "encode" them, they have to be factors to. Please show us more. Maybe the results of `summary(rpart.model)` where `rpart.model` is the variable that holds your model after running `rpart()` – Brandon Bertelsen Oct 04 '18 at 09:20

0 Answers0