1

When I try to run my random forest (for classification) I get the warning

Warning message:
In randomForest.default(m, y, ...) :
  The response has five or fewer unique values.  Are you sure you want to do regression?

I already cleaned my (huge) dataset with the janitor package and tried to factor the variables. Does anyone understand why I still get this warning?

data2 <- experimental_data

x = janitor::clean_names(data2)

#--------------------------------------

#Partition data
set.seed(93)
ind <- sample(2, nrow(x), replace= TRUE,prob=c(0.7,0.3))
train <- x[ind==1,]
test<- x[ind==2,]

str(train)
train[sapply(train, is.character)] <- lapply(train[sapply(train, is.character)], 
                                       as.factor)
str(train)
#Train Random forest on UCI heart dataset
rf <- randomForest(y_full~., data=train, importance=TRUE, predict.all=TRUE,proximity=TRUE)
gawi
  • 2,843
  • 4
  • 29
  • 44
  • The warning means you have at most 5 classes. You should consider classification rather than regression – Onyambu Jun 12 '22 at 14:09
  • RomanyDekker, it's a little difficult to know for certain the resolution for this; while I suspect onyambu's comment is correct, we don't know what your data looks like. Please skim through https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info, notably about how to share sample data in a way that we can easily use. Thanks! – r2evans Jun 12 '22 at 14:15
  • I thought randomforest is excecuting classification by default? How do I manually check or make randomforest consider classification instead of regression? – Romany Dekker Jun 13 '22 at 09:37
  • The data I use can be found at this link https://dataverse.harvard.edu/file.xhtml?fileId=5799386&version=1.1 – Romany Dekker Jun 13 '22 at 09:40
  • Hi Guys, I fixed it: turning the variables into factors should be done BEFORE the splitting of data. Thanks for your time anyways. – Romany Dekker Jun 13 '22 at 10:14

0 Answers0