I am doing this:
RMSE <- (sum((RFestimated-model1$y)^2)/length(model1$y))^(1/2)
where: mode1 is regression model from a Random Forest and y is the value being predicted from the Training data RFestimated is the predicted value from the test data
I am trying to calculate RMSE Is there a trick to making the lengths equal?
These are my steps: (code)
# sample 80% of the data for training -random sample
train_index <- sample(1:nrow(beijingData), 0.8 * nrow(beijingData))
# take the difference as data to test the model
test_index <- setdiff(1:nrow(beijingData), train_index)
#create Train and Test data sets based on the indexes above.
dataTrain <- beijingData[train_index,]
dataTest <- beijingData[test_index,]
#check the datasets dimensions
dim(dataTrain)
dim(dataTest)
> dim(dataTrain)
[1] 33405 13
> dim(dataTest)
[1] 8352 13
#set seed
set.seed(100)
#create a random forest regression model
model1 <- randomForest(pm2.5 ~ ., data = dataTrain, ntree=500, importance =
TRUE)
model1
#predict with test data
RFestimated <- predict(model1, dataTest)
[1] 118.7794
> length(RFestimated)
[1] 8352
> length(model1$y)
[1] 33405
qqnorm((RFestimated - model1$y)/sd(RFestimated-model1$y))
qqline((RFestimated-model1$y)/sd(RFestimated-model1$y))
#results of last tow statements above
> qqnorm((RFestimated - model1$y)/sd(RFestimated-model1$y))
Warning messages:
1: In RFestimated - model1$y :
longer object length is not a multiple of shorter object length
2: In RFestimated - model1$y :
longer object length is not a multiple of shorter object length
>
> qqline((RFestimated-model1$y)/sd(RFestimated-model1$y))
Warning messages:
1: In RFestimated - model1$y :
longer object length is not a multiple of shorter object length
2: In RFestimated - model1$y :
longer object length is not a multiple of shorter object length