7

I am new to R. Forgive me if this if this question has an obvious answer but I've not been able to find a solution. I have experience with SAS and may just be thinking of this problem in the wrong way.

I have a dataset with repeated measures from hundreds of subjects with each subject having multiple measurements across different ages. Each subject is identified by an ID variable. I'd like to plot each measurement (let's say body WEIGHT) by AGE for each individual subject (ID).

I've used ggplot2 to do something like this:

ggplot(data = dataset, aes(x = AGE, y = WEIGHT )) + geom_line() + facet_wrap(~ID)

This works well for a small number of subjects but won't work for the entire dataset.

I've also tried something like this:

ggplot(data=data, aes(x = AGE,y = BW, group = ID, colour = ID)) + geom_line()

This also works for a small number of subjects but is unreadable with hundreds of subjects.

I've tried to subset using code like this:

temp <- split(dataset,dataset$ID)

but I'm not sure how to work with the resulting dataset. Or perhaps there is a way to simply adjust the facet_wrap so that individual plots are created?

Thanks!

Matt
  • 75
  • 1
  • 5
  • 1
    Can you clarify your question somewhat? Are you trying to create a facet plot for multiple IDs, just only on a subset of the IDs in your whole data set? – joran Oct 02 '13 at 21:02
  • 1
    did you try facet_wrap? what do you mean by " perhaps there is a way to simply adjust the facet_wrap so that individual plots are created?. and How many ID's do you have? can you please give some reproducible example? – Ananta Oct 02 '13 at 21:42
  • Sorry for not being more clear. I tried facet_wrap but I have too many subjects (>700) so the output was unreadable. I'm not sure if there is a way to subset the data so that you could create separate facet_plots with only 12-16 individuals per plot? – Matt Oct 03 '13 at 18:38

3 Answers3

20

Because you want to split up the dataset and make a plot for each level of a factor, I would approach this with one of the split-apply-return tools from the plyr package.

Here is a toy example using the mtcars dataset. I first create the plot and name it p, then use dlply to split the dataset by a factor and return a plot for each level. I'm taking advantage of %+% from ggplot2 to replace the data.frame in a plot.

p = ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
    geom_line()

require(plyr)
dlply(mtcars, .(cyl), function(x) p %+% x)

This returns all the plots, one after another. If you name the resulting list object you can also call one plot at a time.

plots = dlply(mtcars, .(cyl), function(x) p %+% x)
plots[1]

Edit

I started thinking about putting a title on each plot based on the factor, which seems like it would be useful.

dlply(mtcars, .(cyl), function(x) p %+% x + facet_wrap(~cyl))

Edit 2

Here is one way to save these in a single document, one plot per page. This is working with the list of plots named plots. It saves them all to one document, one plot per page. I didn't change any of the defaults in pdf, but you can certainly explore the changes you can make.

pdf()
plots
dev.off()

Updated to use package dplyr instead of plyr. This is done in do, and the output will have a named column that contains all the plots as a list.

library(dplyr)
plots = mtcars %>%
    group_by(cyl) %>%
    do(plots = p %+% . + facet_wrap(~cyl))


Source: local data frame [3 x 2]
Groups: <by row>

  cyl           plots
1   4 <S3:gg, ggplot>
2   6 <S3:gg, ggplot>
3   8 <S3:gg, ggplot>

To see the plots in R, just ask for the column that contains the plots.

plots$plots

And to save as a pdf

pdf()
plots$plots
dev.off()
aosmith
  • 34,856
  • 9
  • 84
  • 118
  • Interesting - I'd never thought to put plots in a list before. – Matt Parker Oct 02 '13 at 22:53
  • Thanks! I managed to get that to work. I like how I can then display or save a subset of the plots from the list (e.g., plots[1:10], plots[200:210], etc.). I'm still having trouble getting these to save to a file within the code, but that at least gets me what I need for now. – Matt Oct 04 '13 at 13:14
  • I made an edit to show one way to save all the plots into one document. – aosmith Oct 04 '13 at 14:56
  • This is such nice clear and clean code. It helped me make a lot of graphs, which also looks awesome. The dynamic part, where you add title is very helpful! – Thorst Aug 05 '14 at 11:34
  • 1
    Just seeing your update. I was not on SO by the original posting date. Good learning here. I made it favourite. +1! – jazzurro Oct 15 '14 at 15:57
  • note that `%+%` is a function, you don't need an anonymous wrapper, `dlply(mtcars, .(cyl), "%+%", e1 = p)` – baptiste May 19 '15 at 00:09
3

A few years ago, I wanted to do something similar - plot individual trajectories for ~2500 participants with 1-7 measurements each. I did it like this, using plyr and ggplot2:

library(plyr)
library(ggplot2)

d_ply(dat, .var = "participant_id", .fun = function(x) {

    # Generate the desired plot
    ggplot(x, aes(x = phase, y = result)) +
        geom_point() +
        geom_line()

    # Save it to a file named after the participant
    # Putting it in a subdirectory is prudent
    ggsave(file.path("plots", paste0(x$participant_id, ".png")))

})

A little slow, but it worked. If you want to get a sense of all participants' trajectories in one plot (like your second example - aka the spaghetti plot), you can tweak the transparency of the lines (forget coloring them, though):

ggplot(data = dat, aes(x = phase, y = result, group = participant_id)) + 
    geom_line(alpha = 0.3)
Matt Parker
  • 26,709
  • 7
  • 54
  • 72
  • I asked [a somewhat similar question, once](http://stackoverflow.com/questions/1352863/getting-foreach-and-ggplot2-to-get-along). `plyr` is a better way to go, I think... – Matt Parker Oct 02 '13 at 22:55
  • Thanks for the suggestions. Your second idea worked just fine and was an interesting way to plot the data from all subjects as a single figure. I think your first idea is what I was looking for, however I can't get your code to run without an error. Not sure what I'm doing wrong. – Matt Oct 04 '13 at 12:57
  • 1
    @Matt Parker, this is a nice way to save each plot. I also got an error message with your code, though. It had to do with the `file.path` line. If I changed it to `file.path(paste0("plots", x$participant_id, ".png")` this worked for me. – aosmith Oct 04 '13 at 14:55
2
lapply(temp, function(X) ggplot(X, ...))

Where X is your subsetted data

Keep in mind you may have to explicitly print the ggplot object (print(ggplot(X, ..)))

Señor O
  • 17,049
  • 2
  • 45
  • 47