0

I'm new to Random Forests in R, and I'm trying to make a prediction. I have built a Random Forest model using the following code, which works fine

library(randomForest)
RF_model = randomForest(trainrows[,col_truth]~.
                    ,data = trainrows[,cols_to_use]
                    ,ntree=100
                    ,do.trace=T)

If I print out RF_model, I get the following output

Call:
 randomForest(formula = trainrows[, col_truth] ~ ., data = trainrows[,      cols_to_use], ntree = 100, do.trace = T) 
               Type of random forest: classification
                     Number of trees: 100
No. of variables tried at each split: 4

        OOB estimate of  error rate: 19.23%
Confusion matrix:
     0    1 class.error
0 7116 1640   0.1873001
1 1725 7015   0.1973684

Then, when I try and make a prediction with the model, I get the following error

> predict(RF_model)
Error in 1:dim(data)[1] : argument of length 0

I have tried supplying data to the predict method, but I get the same error. Does anyone know what's going on and how to fix it?

EDIT

In order to provide some more data, I have tried using Random Forests with the iris dataset.

rf = randomForest(iris[,1]~., data=iris[,c(1, 2)], ntree=100)
predict(rf)
Error in 1:dim(data)[1] : argument of length 0

This is not related to my data, but a problem with my version of R, I think. Any ideas?

Jon
  • 3,985
  • 7
  • 48
  • 80
  • 2
    Please include sample data to make your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Feel free to use a built-in data set, but unless we can run the same code and get the same error, it's difficult to help. – MrFlick Jun 27 '14 at 00:23
  • `rf = randomForest(iris[,1]~., data=iris[,c(1, 2)], ntree=100) ; predict(rf)` works fine, so this issue is probably specific to your dataset. Please include a reproducible example. – josliber Jun 27 '14 at 00:47
  • 2
    If i had to guess, the problem is likely related to your formula specification, which follows none of the conventions of specifying formulas in R. Formulas contain names of columns. DO NOT mix subseting into your formulas. Ever. – joran Jun 27 '14 at 01:48
  • I have just adjusted my question showing more data – Jon Jun 27 '14 at 16:22

1 Answers1

0

When you use the predict function, you are trying to predict the outcome or labels for your test set.

rf_predict <- predict(RF_model, test_set)

You can create a confusion matrix to compare the accuracy of your random forest by using the table function

table(observed, rf_predict)

Note: The observed will be the correct labels for the test set

Barb
  • 9
  • 2