5

I am a total R beginner here, with corresponding level of sophistication of this question.

I am using the ROCR package in R to generate plotting data for ROC curves. I then use ggplot2 to draw the plot. Something like this:

library(ggplot2)
library(ROCR)

inputFile <- read.csv("path/to/file", header=FALSE, sep=" ", colClasses=c('numeric','numeric'), col.names=c('score','label'))

predictions <- prediction(inputFile$score, inputFile$label)
auc <- performance(predictions, measure="auc")@y.values[[1]]

rocData <- performance(predictions, "tpr","fpr")
rocDataFrame <- data.frame(x=rocData@x.values[[1]],y=rocData@y.values[[1]])

rocr.plot <- ggplot(data=rd, aes(x=x, y=y)) + geom_path(size=1)
rocr.plot <- rocr.plot + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,4))),colour="black",size=4)

This works well for drawing a single ROC curve. However, what I would like to do is read in a whole directory worth of input files - one file per classifier test results - and make a ggplot2 multifaceted plot of all the ROC curves, while still printing the AUC score into each plot.

I would like to understand what is the "proper" R-style approach to accomplishing this. I am sure I can hack something together by having one loop go through all files in the directory and create a separate data frame for each, and then having another loop to create multiple plots, and somehow getting ggplo2 to output all these plots onto the same surface. However, that does not let me use ggplot2's built-in faceting, which I believe is the right approach. I am not sure how to get my data into proper shape for faceting use, though. Should I be merging all my data frames into a single one, and giving each merged chunk a name (e.g. filename) and faceting on that? If so, is there a library or recommended practice for making this happen?

Your suggestions are appreciated. I am still wrapping my head around the best practices in R, so I'd rather get expert advice instead of just hacking things up to make code that looks more like ordinary declarative programming languages that I am used to.

EDIT: The thing I am least clear on is whether, when using ggplot2's built-in faceting capabilities, I'd still be able to output a custom string (AUC score) into each plot it will generate.

Inverseofverse
  • 344
  • 4
  • 9
  • 1
    The answer is going to involve `lapply` to loop through your files, then `do.call(rbind, ...)` to merge the data frames into a single data frame. This single data frame should contain a column that identifies the facet variable, as you said. Then it's one call to `ggplot` and you're done. – Andrie Aug 08 '12 at 07:07
  • Thanks, Andrie. This seems like a good basic recipe for getting the multi-plot up and running, but (as per edit above), I am unsure how to go about customizing each individual plot with a string to be rendered into it. I don't see how I can associate a single string with a portion of the dataframe containing the data for each plot, and how to tell ggplot2 where to find that string and what to do with it. – Inverseofverse Aug 08 '12 at 07:12
  • If you make a small [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that demonstrates your question, I'll take a look. Make a sample data frame, create a `ggplot` with a facet, and indicate what data you want to have in your `geom_text`. – Andrie Aug 08 '12 at 07:17

1 Answers1

13

Here is an example of how to generate a plot as you described. I use the built-in dataset quakes:

The code does the following:

  • Load the ggplot2 and plyr packages
  • Add a facet variable to quakes - in this case I summarise by depth of earthquake
  • Use ddply to summarise the mean magnitude for each depth
  • Use ggplot with geom_text to label the mean magnitude

The code:

library(plyr)
library(ggplot2)

quakes$level <- cut(quakes$depth, 5, 
  labels=c("Very Shallow", "Shallow", "Medium", "Deep", "Very Deep"))

quakes.summary <- ddply(quakes, .(level), summarise, mag=round(mean(mag), 1))

ggplot(quakes, aes(x=long, y=lat)) + 
  geom_point(aes(colour=mag)) + 
  geom_text(aes(label=mag), data=quakes.summary, x=185, y=-35) +
  facet_grid(~level) + 
  coord_map()

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 2
    Thank you for a comprehensive example! I've learned a number of interesting things from it: proper application of ddply(), providing a data reference to geom_text, etc. My project ended up being quite different, but I was able to get it to work - and, more importantly, understand why it works by pulling your sample code apart. Appreciate it. – Inverseofverse Aug 09 '12 at 05:54
  • 1
    So much ... to process in this answer. Thanks for the succinct and wise answer. – KarateSnowMachine Aug 09 '12 at 15:50