5

I have been using lapply and sapply as my go-to functions recently. So far so good, but why the following code does not work baffles me.

df<-as.data.frame(matrix(rnorm(50),ncol=5))
names(df)<-c("x1","x2","x3","x4","x5")
df1<-seq_len(10)

ll<-lapply(seq(1,5), function(i) qplot(df1,df[,i]))

I get the error:

Error in `[.data.frame`(df, , i) : undefined columns selected

Ok, apparently I made quite an unfortunate mistake in my reproducible code. It works now, but all the plots in the ll list are the same plot. When I run this:

do.call(grid.arrange,ll)

I get the following image:

Grid

All the plots are the same! This is also the output I get when I run this through my data.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Pinemangoes
  • 1,158
  • 3
  • 11
  • 13

3 Answers3

7

There are problems with lazy evaluation, or something like it anyway. You need to do the following:

ll<-lapply(
  seq(1,5), 
  function(i) qplot(data=data.frame(y=df[, i]), df1, y)
)

This will force the y values to be updated for each plot.

More discussion in this other SO Post.

Community
  • 1
  • 1
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • 4
    +1! Or skip the looping entirely, and use facetting. This is how `ggplot2` was designed. – Paul Hiemstra Mar 07 '14 at 14:09
  • 3
    @PaulHiemstra, I agree, though there are some circumstances where lists of ggplot plots are actually a useful thing, but you're probably right that in this case the facetted approach is the better outcome. – BrodieG Mar 07 '14 at 14:11
5

The problem you get is related to lazy evaluation. This means that the functions in ll are only really evaluated when you call them, which is in grid.arrange. At that time, each function will try and locate i, which will have a value of 5 by that time because that is the last value of i at the end of the lapply loop. Therefore, the data extracted from df is always the fifth column, thus your plots are all equal.

To prevent this, you need to force the data extraction to take place when the function is created, for example using @BrodieG's method. There, a new data.frame is created, forcing the data from df to be picked up. Alternatively, you can use force to force the evaluation of i.

See also for more examples and explanations of lazy evaluation:


For creating plots of multiple columns in the same data.frame I would use facet_wrap. To use facet_wrap, you need to reorder your data using melt from the reshape2 package:

library(ggplot2)
library(reshape2)
df$xvalues = 1:10
df_melt = melt(df, id.vars = 'xvalues')
ggplot(df_melt, aes(x = xvalues, y = value)) + 
    geom_point() + facet_wrap(~ variable)

enter image description here

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • This is indeed a better solution since it allows me to use the more elegant `ggplot` + `ggsave` combo rather than the `qplot` + `grid.arrange` mess. Thank you for your help. – Pinemangoes Mar 07 '14 at 14:17
  • I've rarely had to resort to using a loop, facetting can almost always be used to get the same result. – Paul Hiemstra Mar 07 '14 at 14:32
3

You are telling it to execute for 10 columns where you only have 5. This works:

ll<-lapply(seq(1,5), function(i) qplot(df1,df[,i]))