1

After creating a simple decision tree using rpart, I want to plot the performance using ROCR. When I change the avg= parameter the curve of the ROC changes significantly.

When I use the same method and changes in plotting the performance of a GLM model, nothing changes. Why does this parameter only influence the tree plot and in which way.

# create tree model
bsprp <- mean(df.sub.train$y)
target <- y_fact ~ age + gender + a + b + c + d

m.dt <- rpart(target, 
          data = df.sub.train, 
          parms=list(prior=c(bsprp,1-bsprp)), cp=0.005)


# predict on df.sub.vld
dt.predicted <- predict(m.dt, newdata = df.sub.vld)

dt.pred <- prediction(dt.predicted[,2],df.sub.vld$y)
dt.perf <- performance(dt.pred, "tpr", "fpr")

# plot performance 
plot(dt.perf, avg= "threshold", col="red", lwd= 2, main= "ROC curve")
abline(0, 1, untf = FALSE, col = "lightgray", lty = 2)

# vs

plot(dt.perf, avg= "none", col="red", lwd= 2, main= "ROC curve")
abline(0, 1, untf = FALSE, col = "lightgray", lty = 2)

An example of the dataset used:

   y_fact y      age gender bf2          a             b          c
5       1 1 71.11233   Male  40          6             0          0
10      1 1 51.83836   Male  11          5             3          0
13      1 1 70.14521 Female   7          3             1          0
15      1 1 40.00548   Male  64          6             0          0
16      1 1 55.81096   Male  55          8             1          0
19      1 1 54.45479   Male  13          3             1          0

Screenshots of the different plots:

threshold

none

eventive
  • 11
  • 2
  • It would help if you could provide a reproducible example. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for more info. – Calimo Jun 09 '19 at 19:03

0 Answers0