The issue here is that when you specify x
in randomForest
, x
should be "a data frame or a matrix of predictors, or a formula describing the model to be fitted". You are specifying a vector, Data_[, my_preds]
where I assume my_preds
is a string describing the column name. You get a vector by default when specifying one column of a data frame.
You can use drop = FALSE
to ensure that x
stays as a data frame column.
RF_MODEL <- randomForest(x = Data_[,my_preds, drop = FALSE],
y = as.factor(Data_$P_A),
data = Data_,
ntree = 1000, importance = TRUE)
We can demonstrate using the iris
dataset.
library(randomForest)
randomForest(x = iris[, "Sepal.Width"], y = iris$Species, data = iris)
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero
Using drop = FALSE:
randomForest(x = iris[, "Sepal.Width", drop = FALSE], y = iris$Species, data = iris)
Call:
randomForest(x = iris[, "Sepal.Width", drop = FALSE], y = iris$Species, data = iris)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 1
OOB estimate of error rate: 52.67%
Confusion matrix:
setosa versicolor virginica class.error
setosa 31 2 17 0.38
versicolor 3 20 27 0.60
virginica 17 13 20 0.60
You might also consider using a formula to avoid this issue:
randomForest(Species ~ Sepal.Width, data = iris)