5

Trying to predict the accuracy of a model using RandomForest but faced the following error.
Error: data and reference should be factors with the same levels.

This is the code for the following

rfModel <- randomForest(Churn ~., data = training)
print(rfModel)
pred_rf <- predict(rfModel, testing)
caret::confusionMatrix(pred_rf, testing$Churn)
testing$Churn

the training and test data was split with a ratio of 7:3

also received the following warning while running the code

Warning messages:
1: In get(results[[i]], pos = which(search() == packages[[i]])) :
  restarting interrupted promise evaluation
2: In get(results[[i]], pos = which(search() == packages[[i]])) :
  internal error -3 in R_decompress1

structure of test data

str(testing)
'data.frame':   999 obs. of  18 variables:
 $ account_length        : int  84 75 147 141 65 62 85 93 76 73 ...
 $ International.plan    : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 1 1 1 1 1 ...
 $ Voice.mail.plan       : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 2 1 ...
 $ Number.vmail.messages : int  0 0 0 37 0 0 27 0 33 0 ...
 $ Total.day.minutes     : num  299 167 157 259 129 ...
 $ Total.day.calls       : int  71 113 79 84 137 70 139 114 66 90 ...
 $ Total.day.charge      : num  50.9 28.3 26.7 44 21.9 ...
 $ Total.eve.minutes     : num  61.9 148.3 103.1 222 228.5 ...
 $ Total.eve.calls       : int  88 122 94 111 83 76 90 111 65 88 ...
 $ Total.eve.charge      : num  5.26 12.61 8.76 18.87 19.42 ...
 $ Total.night.minutes   : num  197 187 212 326 209 ...
 $ Total.night.calls     : int  89 121 96 97 111 99 75 121 108 74 ...
 $ Total.night.charge    : num  8.86 8.41 9.53 14.69 9.4 ...
 $ Total.intl.minutes    : num  6.6 10.1 7.1 11.2 12.7 13.1 13.8 8.1 10 13 ...
 $ Total.intl.calls      : int  7 3 6 5 6 6 4 3 5 2 ...
 $ Total.intl.charge     : num  1.78 2.73 1.92 3.02 3.43 3.54 3.73 2.19 2.7 3.51 ...
 $ Customer.service.calls: int  2 3 0 0 4 4 1 3 1 1 ...
 $ Churn                 : chr  "0" "0" "0" "0" ...

structure of training set is same and has 2334 observation

structure of pred_rf

 str(pred_rf)
 Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 2 2 1 1 1 1 ...
 - attr(*, "names")= chr [1:999] "4" "5" "8" "10" ...

Please help me out.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Dishant Shetty
  • 75
  • 1
  • 1
  • 8
  • Can you provide minimal data to reproduce the error? I would suspect that the factor levels between training and testing dataset are different, but I have no idea what your data looks like – George Jul 27 '18 at 00:07
  • 1
    Possible duplicate of [Error in ConfusionMatrix the data and reference factors must have the same number of levels R CARET](https://stackoverflow.com/questions/24801452/error-in-confusionmatrix-the-data-and-reference-factors-must-have-the-same-numbe) – Andrew Chiu Jul 27 '18 at 00:15
  • @George I have provided the structure of my data in my question. Please have a look again. Thank you – Dishant Shetty Jul 27 '18 at 00:59
  • Definitely need to post a subset of your data, e.g. using `dput`, so the problem is reproducible and people can conceivably work through it and provide you with answer. Thanks :) – mysteRious Jul 27 '18 at 04:39

1 Answers1

3

OK, I just had this same problem and figured it out.

Look over your str(testing), notice your Churn is not a factor but a chr.

First you need to set your Churn to a factor,

Churn <- as.factor(testing$Churn)

Check your str(testing) again to see that it has in fact change.

Now you may use:

test_predictions = predict(rf_model, testing_set)
test_predictions

conf_matrix = confusionMatrix(test_predictions, Churn)
conf_matrix

See: https://community.rstudio.com/t/how-to-deal-with-rlang-errors/27248

mccurcio
  • 1,294
  • 5
  • 25
  • 44