How to use lapply with ggplot2 while indexing variables

Question

I would like to generate several hundred boxplots of continuous data from a large data frame, stratified by the factor "year". I started by creating a list from the original data frame that contains each dependent variable and the year.

Here is an example data set that looks like mine:

l<-list(data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)),     
var1=sample(1:100,30,replace=T)), 
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var2=sample(100:200,30,replace=T)),
    data.frame(year=c(rep("2010",10),rep("2011",10),rep("2012",10)), 
var3=sample(25:50,30, replace=T)))

The next step was to apply a ggplot2 function over the list. Neither of these functions produce plots:

lapply(l, function (j) ggplot(j, aes(x=year, y=j[,2], fill=year)) +    
 geom_boxplot() + ylab(names(j[2])) )

lapply(l, function (j) ggplot(j, aes(x=year, y=j[[1]][2], fill=year)) +  
 geom_boxplot() + ylab(names(j[2])) )

The same error message is generated from those scripts:

Error: No layers in plot"

In actuality, my data frame is much larger -- 2800 observations and over 250 different variables with unique descriptive names (e.g. "M2_loss", "SSC"). Each variable is on a different scale, so using facets is not a good solution. What makes this question different from other examples on stackoverflow is that I am trying to index the data rather than explicitly name it. It is important that I capture the unique name of each variable and use it to label the y-axis.

Any ideas on how to proceed?

Thanks for pointing that out. The (now deleted) for loop creates the list from my original data frame, but it isn't needed for this reproducible example. — user2899713, Jan 22 '16 at 00:52

score 3 · Answer 1 · answered Jan 22 '16 at 05:13

If I understand what you want, I think you can make things much simpler by using aes_string instead of aes. This allows you to specify the variables of interest as strings rather than as names. Here is a simple example using the well worn iris data set:

lapply( names(iris)[1:4], function(n) ggplot(data = iris, aes_string(y = n, x = "Species")) + geom_boxplot() ) This generates side-by-side boxplots (by species) for each of the four quantitative variables in the iris data set and should be easy to adjust for your data frame.

Thanks for the tip on aes_strings(). – user2899713 Jan 22 '16 at 05:46 — user2899713, Jan 22 '16 at 05:46

score 1 · Answer 2 · answered Jan 22 '16 at 00:51

1

If you want to have the lapply function actually create output on hte console screen device, it would be a matter of adding a +geom_boxplot call:

 plist <- lapply(l, function (j) print( ggplot(j, aes(x=year, y=j[,2], fill=year))     
  ylab(names(j[2])) +geom_boxplot() ) )

If you wanted to store in a list and then plot later leave out the print call:

 plist <- lapply(l, function (j)  ggplot(j, aes(x=year, y=j[,2], fill=year)) +
                                      ylab(names(j[2])) +geom_boxplot() ) 
# To print ...
plist[[1]]

answered Jan 22 '16 at 00:51

IRTFM

258,963
21
364
487

It looks like I made another error in my attempt to create a reproducible example - I left out the boxplot geom that is normally included in my much more extensive ggplot2 scripts. That is corrected now. When I run your code, I get this error message: "Error in eval(expr, envir, enclos) : object 'j' not found". – user2899713 Jan 22 '16 at 01:40
I'm using marrangeGrob() to plot the boxplots in Rmarkdown. Incredibly, R plotted them fine on Monday night in Rmarkdown, but when I updated the file on Tuesday, I ran into trouble. I cannot ascertain what exactly has changed. – user2899713 Jan 22 '16 at 01:45
I tested both versions on a Mac (El Cap) running R 3.2.3 with ggplot2_2.0.0. I do know that there were changes relatively recently in some of the gridExtra functions. – IRTFM Jan 22 '16 at 02:00

score 0 · Answer 3 · answered Jan 22 '16 at 05:46

0

The issue turned out to be old versions of R (3.2.2) that was confusing Rstudio. Once I deleted the old version, that solved the problem - my original lapply() function (the first example) works fine.

answered Jan 22 '16 at 05:46

user2899713

11
1
4

How to use lapply with ggplot2 while indexing variables

3 Answers3

Linked