-2

I want to compute the Roc curve and then the AUC from the linear discriminant model. Do you know how can I do this? here there is the code:

##LDA
require(MASS)
library(MASS)
lda.fit = lda(Negative ~., trainSparse)
lda.fit
plot(lda.fit)
###prediction on the test set
lda.pred=predict(lda.fit,testSparse)
table(testSparse$Negative,lda.pred$class)
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
mac gionny
  • 333
  • 1
  • 3
  • 8
  • @calimo I tried this code: > rocplot = function(pred, truth, ...){ + predob = prediction(pred, truth) + perf = performance(predob, "tpr", "fpr") + plot(perf, ...) + } > yhat.opt = predict(lda.fit,testSparse, decision.values = TRUE) > fitted.opt = attributes(yhat.opt)$decision.values > par(mfrow = c(1, 2)) > rocplot(fitted.opt,testSparse["Negative"], main = "Training Data"), but then it appear this error: Error in prediction(pred, truth) : Format of predictions is invalid. – mac gionny Jan 08 '17 at 17:52
  • 1
    Possible duplicate of [How to compute AUC with ROCR package](http://stackoverflow.com/questions/41523761/how-to-compute-auc-with-rocr-package) – Tobia Tesan Jan 10 '17 at 14:43

2 Answers2

3

Simply try this:

library(ROCR)
# choose the posterior probability column carefully, it may be 
# lda.pred$posterior[,1] or lda.pred$posterior[,2], depending on your factor levels 
pred <- prediction(lda.pred$posterior[,2], testSparse$Negative) 
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • thank you very much!! Do you know how can I see the number of factors to select? And another thing, when I fit the lda model it appears this warning: lda.fit = lda(Negative ~.-Positive, trainSparse) Warning message: In lda.default(x, grouping, ...) : variables are collinear. Is it a problem?@sandipan – mac gionny Jan 09 '17 at 11:21
  • @macgionny I think your first question is how to know the right factor level to select from lda.pred$posterior, right? let's say your positive factor level in your response variable is 'Y', then `prediction()` will expect two arguments, for each data tuple, the first one being the probability predicted by the model is 'Y' and the second argument is the true label for that instance. The second question's answer is that your predictors are linearly dependent, so it's a multi-collinearity problem that is the reason for the warning, you should do VIF test and drop some of the variables. – Sandipan Dey Jan 09 '17 at 11:54
  • If your model has high precision & recall, the ROC curve plotted is likely be of the above shape. – Sandipan Dey Jan 09 '17 at 11:58
0

I would do it like this. Because here you get AUC measure as well + looks super lean and slick

install.packages("pROC")

library(pROC)

par(pty = "s")

roc(testSparse$Negative,lda.pred$posterior[,2],plot=TRUE, legacy.axes = TRUE, 
percent =TRUE, xlab="False Positive Percentage", ylab="True Positive Percentage")
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45