0

I'm using the random forest algorithm by using one predictor.

  RF_MODEL <- randomForest(x=Data_[,my_preds], y=as.factor(Data_$P_A), data=Data_, ntree=1000, importance =T)

But I got this error message:

Error in if (n == 0) stop("data (x) has 0 rows") : 
 l'argument est de longueur nulle

Does this mean that we can't use RF with one variable?

neilfws
  • 32,751
  • 5
  • 50
  • 63
user1988
  • 29
  • 1
  • 7

1 Answers1

0

The issue here is that when you specify x in randomForest, x should be "a data frame or a matrix of predictors, or a formula describing the model to be fitted". You are specifying a vector, Data_[, my_preds] where I assume my_preds is a string describing the column name. You get a vector by default when specifying one column of a data frame.

You can use drop = FALSE to ensure that x stays as a data frame column.

RF_MODEL <- randomForest(x = Data_[,my_preds, drop = FALSE], 
                         y = as.factor(Data_$P_A), 
                         data = Data_, 
                         ntree = 1000, importance = TRUE)

We can demonstrate using the iris dataset.

library(randomForest)

randomForest(x = iris[, "Sepal.Width"], y = iris$Species, data = iris)

Error in if (n == 0) stop("data (x) has 0 rows") : 
  argument is of length zero

Using drop = FALSE:

randomForest(x = iris[, "Sepal.Width", drop = FALSE], y = iris$Species, data = iris)

Call:
 randomForest(x = iris[, "Sepal.Width", drop = FALSE], y = iris$Species,      data = iris) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 1

        OOB estimate of  error rate: 52.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         31          2        17        0.38
versicolor      3         20        27        0.60
virginica      17         13        20        0.60

You might also consider using a formula to avoid this issue:

randomForest(Species ~ Sepal.Width, data = iris)
neilfws
  • 32,751
  • 5
  • 50
  • 63