I'm dealing for the first time with random forests and I'm having some troubles that I can't figure out.. When I run the analysis on all my dataset (about 3000 rows) I don't get any error message. But when I perform the same analysis on a subset of my dataset (about 300 rows) I get an error:
dataset <- read.csv("datasetNA.csv", sep=";", header=T)
names (dataset)
dataset2 <- dataset[complete.cases(dataset$response),]
library(randomForest)
dataset2 <- na.roughfix(dataset2)
data.rforest <- randomForest(dataset2$response ~ dataset2$predictorA + dataset2$predictorB+ dataset2$predictorC + dataset2$predictorD + dataset2$predictorE + dataset2$predictorF + dataset2$predictorG + dataset2$predictorH + dataset2$predictorI, data=dataset2, ntree=100, keep.forest=FALSE, importance=TRUE)
# subset of my original dataset:
groupA<-dataset2[dataset2$order=="groupA",]
data.rforest <- randomForest(groupA$response ~ groupA$predictorA + groupA$predictorB+ groupA$predictorC + groupA$predictorD + groupA$predictorE + groupA$predictorF + groupA$predictorG + groupA$predictorH + groupA$predictorI, data=groupA, ntree=100, keep.forest=FALSE, importance=TRUE)
Error in randomForest.default(m, y, ...) : Can't have empty classes in y.
However, my response variable hasn't any empty class.
If instead I write randomForest like this (a+b+c,y)
instead than (y ~ a+b+c)
I get this other message:
Error in if (n == 0) stop("data (x) has 0 rows") :
argument length zero
Warning messages:
1: In Ops.factor(groupA$responseA + groupA$responseB, :
+ not meaningful for factors
The second problem is that when I try to impute my data through rfImpute()
I get an error:
Errore in na.roughfix.default(x) : roughfix can only deal with numeric data
However my columns are all factors and numeric.
Can somebody see where I'm wrong???