6

I tried to train a random forest with cross validation and used the caret package to train the rf:

### variable return_customer = binary variable
idx.train <- createDataPartition(y = known$return_customer, p = 0.8, list = FALSE)
train <- known[idx.train, ]
test <- known[-idx.train, ]
k <- 10
set.seed(123)
model.control <- trainControl(method = "cv", number = k, classProbs = TRUE, summaryFunction = twoClassSummary,  allowParallel = TRUE)
rf.parms <- expand.grid(mtry = 1:10)
rf.caret <- train(return_customer~., data = train, method = "rf", ntree = 500, tuneGrid = rf.parms, metric = "ROC", trControl = model.control)

When running the train function, I get this error code but there are no missing values in return_customer:

Error in na.fail.default(list(return_customer = c(0L, 0L, 0L, 0L, 0L, : missing values in object

I want to understand why the function is reading missing values in the data and how i can fix this issue. I am aware there are similar questions in the forum but i could not fix my code. Thanks!

Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
BADS_2016
  • 61
  • 1
  • 1
  • 2

1 Answers1

2

Missing values would be in your predictors.

Try this code to remove rows which have empty values:

row.has.na <- apply(train, 1, function(x){any(is.na(x))})
predictors_no_NA <- train[!row.has.na, ]

Hopefully it helps.

sm925
  • 2,648
  • 1
  • 16
  • 28
Shalini Baranwal
  • 2,780
  • 4
  • 24
  • 34