0

I build a model using Random Forest and tried to test it on another database using predict(). However, it only returns NA.

RF=randomForest(intention~., data=train,ntree=1000,na.action=na.roughfix) 
#no NA in the train nor the test dataset

# Predicting
pred <-predict(RF, newdata=test,type="response")
#pred vector is only set to NA

I checked this page and checked my datasets have no NA. However I keep continue having the same return. https://www.kaggle.com/c/the-analytics-edge-mit-15-071x/discussion/7808

I also checked this page, but it doesn't seem accurate for Random Forest (or I do not understand it). r - loess prediction returns NA

Thank for your help !

  • 2
    Does `test` contain all the same variable names that `train` does? You haven't showed us your data, so you will only get guesses here, not answers. – Allan Cameron Sep 07 '20 at 20:40
  • Indeed you touched the point @Allan Cameron. I ran a loop on test and train (as show in another post on stack), to remove column with few data, and it creates asymetrics datasets (the loop removed 5 columns on train but 9 on test)! Thank you for your guess, I did not know it could stem from there. – Grison Mayliss Sep 08 '20 at 08:37
  • - thought it would give an error if it comes from data. Why this doesn't give an error ? – Grison Mayliss Sep 08 '20 at 08:45

1 Answers1

0

As @Allan Cameron guessed, the problem came from the asymmetry of the datasets. Having issues running RF algorithm, I found an advice on this forum to remove variables with too less values, with the following code.

index <- c()
 for (j in (1 : 41))   {
   if (is.numeric(train[ ,j])  &  length(unique(as.numeric(train[ ,j]))) == 1 )
     {index <- append(index,j)}
train <- train[ ,-index]
#ran on test dataset too

however, I did not see it removed 5 columns on train and 9 on test. The function predict() tried to apply the model built with the 51 variables on a data set with 47 variables was returning NA but no error.