53

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.

To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.

(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]]) through print(myplots[[4]]) one at a time.)

Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.

(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)

Here is a reproducible example:

library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function

#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4, 
          2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3, 
          3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4, 
          1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3, 
          3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3, 
          2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3, 
          3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
          2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2, 
          3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_histogram(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

When I look at a summary of a plot object in the plot list, this is what I see

> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping:  x = data2[, i]
faceting: facet_null() 
-----------------------------------
geom_histogram: fill = lightgreen 
stat_bin:  
position_stack: (width = NULL, height = NULL)

I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.

Thanks!

Community
  • 1
  • 1
LizPS
  • 555
  • 1
  • 4
  • 5

5 Answers5

93

In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        p1 <- ggplot(data2, aes(x = data2[[i]])) +
            geom_histogram(fill = "lightgreen") +
            xlab(colnames(data2)[i])
        print(p1)
    })
}

However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:

plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_histogram(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).


1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • nice idea with the function – Rorschach Aug 13 '15 at 17:14
  • Thank you so much, especially for the lapply version; I wanted to functionalize this but couldn't figure it out, and decided to do (superficially easier, actually horrible) for loop. I figured it was a variable scope problem, I am often fighting them in R! – LizPS Aug 14 '15 at 18:27
  • Both these solutions are unwieldy. For some reason, `myplots` burgeons to GB's per iteration in my environment. Using both the local method or function/lapply method. – BigTimeStats May 10 '18 at 16:30
  • 2
    @BigTimeStats Well that’s an issue with having many very big plots, not with either of these solutions. A common solution is to subsample the number of data points you plot (often, such big plots won’t reliably display all individual data points anyway), or to compute summary statistics ahead of plotting (and plot these rather than the raw data). But sometimes neither works. In that case, the only solution is to avoid having multiple plots in memory at once. – Konrad Rudolph May 10 '18 at 17:29
  • Gotcha, thanks for the response.. It's weird, in my environment pane I see the list takes up 118 GB but in my Task Manager, my rstudio session is barely 5Gb. – BigTimeStats May 10 '18 at 17:51
  • 3
    @BigTimeStats The estimate in the environment pane is notoriously unreliable. A large part of the reason is that it estimates each object’s size individually but lots of objects in R (particularly data frames) share memory: if you create one data frame from another by modifying one column, then they will share the memory for all remaining columns. – Konrad Rudolph May 10 '18 at 17:55
  • why do you have 'data' and 'data2' in the function? – baxx Oct 08 '19 at 11:54
  • @KonradRudolph I tried your recommendation with `local(...)` but could not get it to work. Would you have any suggestions for [my use case here on SO](https://stackoverflow.com/questions/62423707/ggplots-stored-in-plot-list-to-respect-variable-values-at-time-of-plot-generatio)? – mavericks Jun 17 '20 at 07:36
21

Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.

myplots <- list()  # new empty list
for (i in 1:4) {
    p1 <- eval(substitute(
        ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
          geom_histogram(fill="lightgreen") +
          xlab(colnames(data2)[ i])
    ,list(i = i)))
    print(i)
    print(p1)
    myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • 2
    The diagnosis is correct but the solution is somewhat convoluted. It’s easier to capture `i` in a local context. The problem is that `for` loops in R have no scope so you need to use `local` instead: `for (i in 1:4) local({i = i; … rest of the loop … })`. The self-assignment `i = i` isn’t by accident — this is actually needed. A different variable name can also be used. Regardless, all this would be unnecessary by using “proper” list functions instead of `for`, which is frankly a bad language construct in R. – Konrad Rudolph Aug 13 '15 at 16:53
  • @KonradRudolph `local` is nice – Rorschach Aug 13 '15 at 16:56
  • 1
    Ah, I forgot something: if `local` is used, the assignment to `myplots[[i]]` needs to use the `<<-` operator instead of local assignment. – Konrad Rudolph Aug 13 '15 at 17:00
  • @KonradRudolph any chance you want to add a solution using one of the `apply` functions. It seems, in that case a substitution or local would also be required? Also, is there a reason that `local` is better than the `substitute` way? – Rorschach Aug 13 '15 at 17:03
  • I prefer `local` because it looks like it’s performing standard evaluation (although that’s not the case of course). it hides the `eval`s and `substitute`s away. In fact neither `lapply` nor `for` really needs to capture the variable `i` if column names are used in the aesthetics. I’ll add an answer. – Konrad Rudolph Aug 13 '15 at 17:07
  • if number of plots is more than 5-6 then you might need to repeat last line `multiplot(plotlist = myplots, cols = 4)` to show all plots – BData Mar 09 '20 at 07:25
3

I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.

Here is the code with the visualizations:

Question

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_bar(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Answer

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        p1 <- ggplot(data2, aes(x = data2[[i]])) +
            geom_bar(fill = "lightgreen") +
            xlab(colnames(data2)[i])
        print(p1)
    })
}

multiplot(plotlist = myplots, cols = 4)

Same result using lapply:


plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_bar(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Created on 2021-04-09 by the reprex package (v0.3.0)

Emy
  • 817
  • 1
  • 8
  • 25
1

Using lapply works too as x exists within the anonymous function environment (using mtcars as data):

plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
  ggplot(data = mtcars) + 
    geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
    labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
    theme_wsj() +
    scale_colour_wsj("colors6")
})
Paul van Oppen
  • 1,443
  • 1
  • 9
  • 18
0

Here is another solution:

#generate plots
myplots <- list()  # new empty list
for (col in colnames(data2)) {
  p1 <- ggplot(data=data.frame(data2),aes(x=!!ensym(col)))+ 
    geom_bar(fill="lightgreen") +
    xlab(col)
  myplots[[col]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
Avish
  • 36
  • 5