[enter link description here][1]
Recently,I have writte a script to train a random forest model to classifier land use/cover type using randomForest package in R.I will get the different overall accuracy and kappa statistics when I run the script 10 times.Now, I want to retrain my model using K-fold cross-validation,but I don't know how to do this and how to find a optimal model? And If I retrain my model using K-fold cross-validation,how can I get the average overall accuracy and kappa statistics?
Does anyone have some experiences or some worked examples?That will be very appreciate.Thank you very much.
My code as follows:
cat("Calculating random forest object\n")
randfor <-randomForest(as.factor(response)~.,data=trainvals,importance=TRUE, na.action=na.omit,proximity=TRUE)
#try to print randomForest model and see the important features
print(randfor)
#Try to see the margin, positive or negative, if positive it means
#correct classification
rf.margin <- margin(randfor,responseTest)
plot(rf.margin)
#display the error rates of a randforForest
plot(randfor)
#Predict the land cover type of the test datasets
pred <- predict(randfor,newdata = trainvalsTest)
#generate a classification table for the testing datasets
rf.table <- table(pred,responseTest)
rf.table
# Plotting variable importance plot
varImpPlot(randfor)
classAgreement(rf.table)
#Print the value of overall accuracy and Kappa Statistic
confusion <- confusionMatrix(pred,responseTest)
confusion
#print the importance of all the input variables
randomForest.importance <- importance(randfor)
randomForest.importance
#using caret package to calculate the variable importance
caret.importance <- varImp(randfor,scale = FALSE)
#print the overalll value of the input variables
print(caret.importance)
#display the variable importance plot
plot(caret.importance)