0

I've trained an SVM model. I'm now trying to generate a confusion matrix and keep getting the following error:

Error in confusionMatrix.default(test.pred, data_test$FAVOURITES_COUNT) : the data cannot have more levels than the reference

Here is the code

model <- svm(FAVOURITES_COUNT~., data = data_train);

test.pred <- predict(model, data_test,na.action = na.pass);

confusionMatrix(test.pred,data_test$FAVOURITES_COUNT)

I have tested if they have same levels using:

> identical (levels(test.pred), levels(data_test$FAVOURITES_COUNT))
[1] TRUE

structure of both pred and data_test$FAVOURITES_COUNT:

>  str(test.pred)  Named num [1:440] 1539 1516 1560 1560 1450 ...
>  - attr(*, "names")= chr [1:440] "1" "4" "11" "13" ...

> str(data_test$FAVOURITES_COUNT)
 int [1:440] 62 10725 84 84 19 99 54 84 84 84 ...

I think the problem is related to chr and int different types but I don't know how to solve this , there is already another question like this but it doesn't provide a solution also if I change the chr to int

pred<-as.integer(format(round(predict(model,data_test))))

the problem is still there. How can I resolve this error

Dataset

data

complete code

rm(list=ls())
df <- read.csv("path/data.csv")
mydata <- df
mydata$ALTMETRIC_ID <- NULL


#library(caret)
split=0.60
trainIndex <- createDataPartition(mydata$FAVOURITES_COUNT, p=split, list=FALSE)
data_train <- mydata[ trainIndex,]
data_test <- mydata[-trainIndex,]
#library(e1071)

model <- svm(FAVOURITES_COUNT~., data = data_train);

test.pred <- predict(model, data_test,na.action = na.pass);

confusionMatrix(test.pred,data_test$FAVOURITES_COUNT)
hyeri
  • 663
  • 9
  • 26
  • It is always better to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Nov 09 '16 at 21:22
  • @ProcrastinatusMaximus added – hyeri Nov 09 '16 at 21:34

2 Answers2

0

faced same issue because the algorithm was predicting results only for one outcome, we need to make sure that our testset has enough of both the outcomes classes such as N and O in my case.

SoodP
  • 1
-1

I had the exact same issue. You need to convert the dependent variable column favorites_counts to factor before creating the train and test.IE.

mydata$FAVOURITES_COUNT<-factor(mydata$FAVOURITES_COUNT)
Preston
  • 7,399
  • 8
  • 54
  • 84