4

I'm using R programming. I divided the data as train & test for predicting accuracy.

This is my code:

library("tree")
credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv")

library("caret")
set.seed(1000)

intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]

treemod<-tree(Creditability~. , data=train)
plot(treemod)
text(treemod)

cv.trees<-cv.tree(treemod,FUN=prune.tree)
plot(cv.trees)

prune.trees<-prune.tree(treemod,best=3)
plot(prune.trees)
text(prune.trees,pretty=0)

install.packages("e1071")
library("e1071")
treepred<-predict(prune.trees, newdata=test)

confusionMatrix(treepred, test$Creditability)

The following error message happens in confusionMatrix:

Error in confusionMatrix.default(rpartpred, test$Creditability) : the data cannot have more levels than the reference

The credit data can download at this site.
http://freakonometrics.free.fr/german_credit.csv

Kyll
  • 7,036
  • 7
  • 41
  • 64
Young Jae Seo
  • 67
  • 1
  • 1
  • 4
  • Please include all the relevant data in your post. Linking to an off-site resource makes this question very localized in point but especially time. Simulated dataset is a very convenient way of conveying what is going on with your dataset. See [this page](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips on how to share your data and code the reproducible way. – Roman Luštrik Aug 03 '16 at 14:02

2 Answers2

2

If you look carefully at your plots, you will see that you are training a regression tree and not a classication tree.

If you run credit$Creditability <- as.factor(credit$Creditability) after reading in the data and use type = "class" in the predict function, your code should work.

code:

credit <- read.csv("http://freakonometrics.free.fr/german_credit.csv" )

credit$Creditability <- as.factor(credit$Creditability)

library(caret)
library(tree)
library(e1071)

set.seed(1000)
intrain <- createDataPartition(y = credit$Creditability, p = 0.7, list = FALSE)
train <- credit[intrain, ]
test <- credit[-intrain, ]

treemod <- tree(Creditability ~ ., data = train, )

cv.trees <- cv.tree(treemod, FUN = prune.tree)
plot(cv.trees)

prune.trees <- prune.tree(treemod, best = 3)
plot(prune.trees)
text(prune.trees, pretty = 0)

treepred <- predict(prune.trees, newdata = test, type = "class")
confusionMatrix(treepred, test$Creditability)
phiver
  • 23,048
  • 14
  • 44
  • 56
  • 1
    More or less, the code then predicts the probabilities whether each entry `test` belong to class '0' and '1', so OP have to convert these predicted probabilities to predicted classifications. – StatMan Aug 03 '16 at 12:07
2

I had the same issue in classification. It turns out that there is ZERO observation in a specific group therefore I got the error "the data cannot have more levels than the reference”.

Make sure there all groups in your test set appears in your training set.

pockeystar
  • 31
  • 4