How to solve "The data cannot have more levels than the reference" error when using confusioMatrix?

Question

I'm using R programming. I divided the data as train & test for predicting accuracy.

This is my code:

library("tree")
credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv")

library("caret")
set.seed(1000)

intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]

treemod<-tree(Creditability~. , data=train)
plot(treemod)
text(treemod)

cv.trees<-cv.tree(treemod,FUN=prune.tree)
plot(cv.trees)

prune.trees<-prune.tree(treemod,best=3)
plot(prune.trees)
text(prune.trees,pretty=0)

install.packages("e1071")
library("e1071")
treepred<-predict(prune.trees, newdata=test)

confusionMatrix(treepred, test$Creditability)

The following error message happens in confusionMatrix:

Error in confusionMatrix.default(rpartpred, test$Creditability) : the data cannot have more levels than the reference

The credit data can download at this site.
http://freakonometrics.free.fr/german_credit.csv

Please include all the relevant data in your post. Linking to an off-site resource makes this question very localized in point but especially time. Simulated dataset is a very convenient way of conveying what is going on with your dataset. See [this page](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips on how to share your data and code the reproducible way. — Roman Luštrik, Aug 03 '16 at 14:02

phiver · Answer 1 · 2016-08-03T12:21:08.213

If you look carefully at your plots, you will see that you are training a regression tree and not a classication tree.

If you run credit$Creditability <- as.factor(credit$Creditability) after reading in the data and use type = "class" in the predict function, your code should work.

code:

credit <- read.csv("http://freakonometrics.free.fr/german_credit.csv" )

credit$Creditability <- as.factor(credit$Creditability)

library(caret)
library(tree)
library(e1071)

set.seed(1000)
intrain <- createDataPartition(y = credit$Creditability, p = 0.7, list = FALSE)
train <- credit[intrain, ]
test <- credit[-intrain, ]

treemod <- tree(Creditability ~ ., data = train, )

cv.trees <- cv.tree(treemod, FUN = prune.tree)
plot(cv.trees)

prune.trees <- prune.tree(treemod, best = 3)
plot(prune.trees)
text(prune.trees, pretty = 0)

treepred <- predict(prune.trees, newdata = test, type = "class")
confusionMatrix(treepred, test$Creditability)

More or less, the code then predicts the probabilities whether each entry `test` belong to class '0' and '1', so OP have to convert these predicted probabilities to predicted classifications. — StatMan, Aug 03 '16 at 12:07

score 2 · Answer 2 · answered Dec 13 '19 at 10:54

I had the same issue in classification. It turns out that there is ZERO observation in a specific group therefore I got the error "the data cannot have more levels than the reference”.

Make sure there all groups in your test set appears in your training set.

How to solve "The data cannot have more levels than the reference" error when using confusioMatrix?

2 Answers2

Linked