1

I'm having some difficulties applying the method performance of ROCR library.

#EX1
model <- glm(Good.Loan ~ ., data=trainSet, family=binomial(link="logit"))
testSet$predGood.Loan <- predict(model,newdata=testSet)
pred <- prediction(predictions = testSet$predGood.Loan, labels =
testSet$Good.Loan)
perf <- performance(pred, measure = "tpr", x.measure = "fpr")

#EX2
model <- C5.0(CostumerClass ~ ., data = trainSet)
predictedCostumerClass<- predict(model , testSet)
pred <- prediction(predictions = predictedCostumerClass, labels =
testSet$CostumerClass)
perf <- performance(pred, measure = "tpr", x.measure = "fpr")

In Ex1, I'm building my model using a generalized Linear Model and then applying the performance method. And it's ok. When I try to use the same thing using a c5.0 model I get the error

Format of predictions is invalid.

The closest help that I could find was in this article.

I can't find what format it's required for the performance method, or if my prediction needs something else.

989
  • 12,579
  • 5
  • 31
  • 53
Bruno Ferreira
  • 466
  • 7
  • 17
  • It would be nice to make your question [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) by including sample input data or or using build in data sets. – MrFlick Jul 06 '15 at 23:11
  • possible duplicate of [How to deal with multiple class ROC analysis in R (pROC package)?](http://stackoverflow.com/questions/20518376/how-to-deal-with-multiple-class-roc-analysis-in-r-proc-package) – Jim G. Sep 19 '15 at 14:09

1 Answers1

1

It appears that by default the C5.0 models will return class labels (discrete values) for predict while glm models return the value of the link function (continuous values). You need continuous values to make an ROC curve so you can try different cut points. Rather than predicting the class, you can predict the probability from the model.

predictedCostumerClass <- predict(model , testSet, type="prob")
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you for your help. I've manage to do some progresses using "prob" instead of "class". But I’m facing another problem. (I don't know if I should post a new question). I'm getting the following error: Error in prediction(predictions = predictedCostumerClass, labels = testSet$CostumerClass) : Number of classes is not equal to 2. I think its because my testSet$CostumerClass has 5 distinct values. If it is so, I don't know what should I do now. – Bruno Ferreira Jul 07 '15 at 18:32
  • I found my answer about multiclass roc curve in this thread. I hope it might be useful to someone else. http://stackoverflow.com/questions/20518376/how-to-deal-with-multiple-class-roc-analysis-in-r-proc-package – Bruno Ferreira Jul 07 '15 at 19:33