1

I am trying to perform cross validation for my data set using random forest.

My response variable is of datatype factor with 2 levels (1, 2).

I am using this function below for my cross validation technique

k = 10

Imputed_data$id <- sample(1:k , nrow(Imputed_data), replace = TRUE)
list <- 1:k

prediction <- data.frame()
testsetcopy <- data.frame()

progress.bar <- create_progress_bar("text")
progress.bar$init(k)

for (i in 1:k){

  trainingset <- subset(Imputed_data,id %in% list[-i])
  testset <- subset(Imputed_data, id %in% c(i))

  # run a random forest model
  mymodel <- randomForest(trainingset$Accepted~ ., data = trainingset)


  temp <- as.data.frame(predict(mymodel, testset[,-13]))

  prediction <- rbind(prediction, temp)


  testsetcopy <- rbind(testsetcopy, as.data.frame(testset[,13]))

  progress.bar$step()
}

result <- cbind(prediction, testsetcopy[,1])
names(result) <- c("Predicted", "Actual")

result$Difference <-abs(result$Actual-result$Predicted)


summary(result$Difference)

I am getting a error in the line

result$Difference <-abs(result$Actual-result$Predicted)

In Ops.factor(result$Actual, result$Predicted) : ‘-’ not meaningful for factors

I could understand that abs cant be used for factors and - is also not used.

I am new to R, and i am unsure how i could then calculate my result. Any lead will be helpful.

Mikz
  • 571
  • 2
  • 11
  • 29

1 Answers1

2

You can't subtract factors, nor can you use abs for factors. That was clear.

The best way to show your results is in a cross table, try e.g.,

table(result$predicted, result$Actual)

Or use caret's function:

confusionMatrix(result$predicted, result$Actual)
benn
  • 198
  • 1
  • 11
  • while using table(prediction, testsetcopy[,1]), i AM getting a error "Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?" – Mikz Feb 27 '18 at 14:27
  • Can you check your objects `prediction` and `testsetcopy[,1]`? They should be factors. `class(prediction)` or `summary(prediction)` would be helpful. – benn Feb 27 '18 at 14:41
  • my summary (prediction) says, 1: 3540 ; 2: 1054 and class(prediction) says it a data frame – Mikz Feb 27 '18 at 14:43
  • isn't that i should confusionMatrix (result$predicted, result$Actual) .. This gives me the result – Mikz Feb 27 '18 at 14:45
  • 1
    Yes, should be good that way! – benn Feb 27 '18 at 14:58
  • 1
    you could edit and I will accept as answer – Mikz Feb 27 '18 at 15:27