Having recently completed Datacamp's course "Machine Learning toolbox" I wanted to apply something I learned: caret can input missing values using argument preProcess = "medianImpute"
If I run table(complete.cases(df))
I get:
FALSE TRUE
24429 6042
So I'll need to do something with missing values. The video made it look so simple!
mod.lm.medians <- train(target ~.,
data = train,
trControl = train_control,
method = "lm",
preProcess = "medianImpute")
Gives:
Error in na.fail.default(list(target = c(5850000L, 6000000L, 5700000L, : missing values in object
I found another SO answer here which told me t try na.action=na.exclude
which lets my model run but only on the complete cases, which is not what I want.
Is my understanding of caret's preprocess parameter incorrect? I expected that missing values would be replaced with the median for the feature for each observation in df. Instead I got this error.