0

I am new to coding, so please bear with me here. I have to run a Roc Curve for fit, but the following code is not drawing the line for me. I am trying to predict etype =2 which is death, using the variables age and sex. cancer is the name of the dataset.

Can anyone tell me what I am doing wrong here?

Thanks so much!

 cancer <- read.csv("C:/Users/Jennifer/Desktop/SurvivalRatesforColonCancer.csv")
print(cancer)

#run descritpive stats
describe(cancer)
summary(cancer)
hist(cancer$age)
skewness(cancer$age)
kurtosis(cancer$age)

#Create a training and testing dataset
bound <- floor((nrow(cancer)/2))
print(bound)
cancer <- cancer[sample(nrow(cancer)),]
cancer.train <- cancer[1:bound, ]
cancer.test <- cancer[(bound+1):nrow(cancer), ]

print(cancer.train)

#create decision tree using rpart
fit <- rpart(etype ~ age + sex, method="class", data=cancer.train)
printcp(fit)
plotcp(fit)
summary(fit)

#Display decision tree
plot(fit, uniform = TRUE)
text(fit, use.n=TRUE, all=TRUE, cex=0.6)

#predict using the test dataset
pred1 <- predict(fit, cancer.test, type="class")

#Place the prediction variable back in the dataset
cancer.test$pred1 <- pred1

#show re-substitution error
table(cancer.train$etype, predict(fit, type="class"))

#Display accuracy rate
sum(cancer.test$etype==pred1)/length(pred1)

#Display Confusion Matrix
table(cancer.test$etype,cancer.test$pred1)

#prune the tree so it isn't overfitted.  Prune so that it will automatically minimize the cross-
#validated error 
pfit<- prune(fit, cp=fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"])
#Display decision tree
plot(pfit, uniform = TRUE)
text(pfit, use.n=TRUE, all=TRUE, cex=0.6)

#Calculate the accuracy rate of the new pruned tree
pred2 <- predict(pfit, cancer.test, type="class")
sum(cancer.test$etype==pred2)/length(pred2)



##############################################
#               ROC Curve                    #
##############################################

# for ROC curve we need probabilties so we can sort cancer.test
cancer.test$etype.probs <- predict(fit,cancer.test, type="prob")[,1] # returns prob of both cats, just need 1

roc.data <- data.frame(cutoffs = c(1,sort(unique(cancer.test$etype.probs),decreasing=T)),
                       TP.at.cutoff = 0,
                       TN.at.cutoff = 0)

for(i in 1:dim(roc.data)[1]){
  this.cutoff <- roc.data[i,"cutoffs"]
  roc.data$TP.at.cutoff[i] <- sum(cancer.test[cancer.test$etype.probs >= this.cutoff,"etype"] == 1)
  roc.data$TN.at.cutoff[i] <- sum(cancer.test[cancer.test$etype.probs < this.cutoff,"etype"] == 0)
}
roc.data$TPR <- roc.data$TP.at.cutoff/max(roc.data$TP.at.cutoff) 
roc.data$FPR <- roc.data$TN.at.cutoff/max(roc.data$TN.at.cutoff) 
roc.data$one.minus.FPR <- 1 - roc.data$FPR

with(roc.data,
     plot(x=one.minus.FPR,
          y=TPR,
          type = "l",
          xlim=c(0,1),
          ylim=c(0,1),
          main="ROC Curve for 'Fit'")     
)
abline(c(0,1),lty=2)
  • 1
    This is a debugging question, but as stated we can't run your code and see what you're seeing because we don't have variables `fit` or `cancer.test`. Could you please make this a reproducible example? You can read more about how to make reproducible examples at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – josliber Oct 09 '15 at 20:30
  • Thanks. I just edited it to include the entire code. Hopefully that helps. – Jenni Graff Oct 09 '15 at 20:46
  • we don't have `SurvivalRatesforColonCancer.csv`, so we still can't really run your models. Could you please include either a subset of your dataset using `dput` (see the link in my previous comment)? – josliber Oct 09 '15 at 20:49
  • 1
    Assuming you have done everything right up until getting to the ROC curve, you might want to check out the ROCR and/or pROC package. Additionally, here's some code I've used in the past that may help you get started if you use the pROC package: `prob <- predict(myprobitmodel, dat, type=c("response"))` `dat$probitProb <- prob` `g <- roc(cancer ~ probitProb, data = dat, plot = T)` `g$auc` – tsurudak Oct 09 '15 at 22:15

0 Answers0