I have a dataset with a binary target (good clients vs. bad clients). For each client, I have a row with many variables (~150).
I wish to do the following:
- Build a prediction of bad clients
- Calculate a score of how bad a client is.
I wanted to use random forests for prediction, and logistic regression for the score (probability of being bad, which give a score between 0 and 1).
I have these problems:
- Random forests don't support missing values. I do know, technically, how to tell R to impute or omit the missing values (I get an error message when using the package randomforest).
- In logistic regression, how to obtain the score for each subject (probability of being a bad client.
- In general, if I want to fit a model in R, like in the randomforest package, and I need a syntax like:
Y~X1+X2+...
, how can I tell R to include in the model all variablesX1
toX150
?
My data looks like this: A variable 'Client'
which is 0
or 1
, and X1-X150 independent variables, some are factor
s, some are numeric
.