1

I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". I started by creating a list from the original data frame that contains each dependent variable and the year.

Here is an example data set that looks like mine:

l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),     
var1=sample(1:100,30,replace=T)), 
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var2=sample(100:200,30,replace=T)),
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var3=sample(25:50,30, replace=T)))

The next step was to apply a ggplot2 function over the list. Neither of these functions produce plots:

lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +    
 geom_boxplot() + ylab(names(j[2])) )

lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +  
 geom_boxplot() + ylab(names(j[2])) )

The same error message is generated from those scripts:

Error: No layers in plot"

In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (e.g. "M2_loss", "SSC"). Each variable is on a different scale, so using facets is not a good solution. What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. It is important that I capture the unique name of each variable and use it to label the y-axis.

Any ideas on how to proceed?

user2899713
  • 11
  • 1
  • 4
  • Thanks for pointing that out. The (now deleted) for loop creates the list from my original data frame, but it isn't needed for this reproducible example. – user2899713 Jan 22 '16 at 00:52

3 Answers3

3

If I understand what you want, I think you can make things much simpler by using aes_string instead of aes. This allows you to specify the variables of interest as strings rather than as names. Here is a simple example using the well worn iris data set:

lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() ) This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris data set and should be easy to adjust for your data frame.

rpruim
  • 320
  • 2
  • 6
1

If you want to have the lapply function actually create output on hte console screen device, it would be a matter of adding a +geom_boxplot call:

 plist <- lapply(l, function (j) print( ggplot(j, aes(x=year, y=j[,2], fill=year))     
  ylab(names(j[2])) +geom_boxplot() ) )

If you wanted to store in a list and then plot later leave out the print call:

 plist <- lapply(l, function (j)  ggplot(j, aes(x=year, y=j[,2], fill=year)) +
                                      ylab(names(j[2])) +geom_boxplot() ) 
# To print ...
plist[[1]]
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • It looks like I made another error in my attempt to create a reproducible example - I left out the boxplot geom that is normally included in my much more extensive ggplot2 scripts. That is corrected now. When I run your code, I get this error message: "Error in eval(expr, envir, enclos) : object 'j' not found". – user2899713 Jan 22 '16 at 01:40
  • I'm using marrangeGrob() to plot the boxplots in Rmarkdown. Incredibly, R plotted them fine on Monday night in Rmarkdown, but when I updated the file on Tuesday, I ran into trouble. I cannot ascertain what exactly has changed. – user2899713 Jan 22 '16 at 01:45
  • I tested both versions on a Mac (El Cap) running R 3.2.3 with ggplot2_2.0.0. I do know that there were changes relatively recently in some of the gridExtra functions. – IRTFM Jan 22 '16 at 02:00
0

The issue turned out to be old versions of R (3.2.2) that was confusing Rstudio. Once I deleted the old version, that solved the problem - my original lapply() function (the first example) works fine.

user2899713
  • 11
  • 1
  • 4