I'm looking for R packages or machine learning models/algos like randomForest
, glmnet
, gbdt
, etc that can handle NA's, as opposed to ignoring the row or column that has any instances of NA's. I'm not looking to impute. Any suggestions?
Asked
Active
Viewed 1,043 times
5

smci
- 32,567
- 20
- 113
- 146

screechOwl
- 27,310
- 61
- 158
- 267
1 Answers
4
The CART algorithm handles NA's rather seamlessly (rpart package). Then you can always turn to bagged trees using rpart
, probably via the ipred package.
I've heard that multivariate adaptive regression splines (mars
in the mda package) handle missing data well, although I don't have much experience with it.
Also, k nearest neighbor models (and kernel methods more generally, I think) can be altered to deal with missing values in a fairly straightforward manner, but implementations may not do that out of the box. But presumably it would be as simple as adjusting the distance metric to only consider pairwise complete cases. I'm less familiar with specific R packages that do more than the vanilla knn models.

joran
- 169,992
- 32
- 429
- 468
-
For kNN, "adjusting the distance metric to only consider pairwise complete cases" would be a mess: distances would be computed differently for with-NA and no-NA cases, and are not comparable. Especially when each distance component is weighted. – smci Oct 17 '17 at 22:19