2

I am trying to create data comparison plots before-after data manipulation for multiple columns in my dataset via a for loop. Eventually, I want to save all comparison plots to one pdf file. First, I generate the plot before, manipulate the data, generate plot after and want to have them side by side via ggarrange (I also tried grid.arrange from gridExtra, but this does not solve the issue). What I get, however, are identical plots AFTER data manipulation (though the titles are different).

Here is a reproducible example:

library(rlist)
library(ggplot2)
library(ggpubr)
head(iris)
plot_before <-  list()
plot_after <- list()
plots <- list()

for (i in 1:4){
  p <-  ggplot(iris,aes(iris[,i])) + geom_histogram()+ggtitle(paste0(i,"_pre"))
  print(p)
  plot_before <- list.append(plot_before,p)
  #do something with your data
  iris[,i] <- 3*iris[,i]
  p2 <-  ggplot(iris,aes(iris[,i])) + geom_histogram()+ggtitle(paste0(i,"_post"))
  print(p2)
  plot_after <- list.append(plot_after, p2)
  q <-  ggarrange(p,p2)  #here, p is already linked to modified data
  print(q)
  plots <- list.append(plots, q)
}
#try to access plots from lists
for (i in 1:4){
  print(plot_before[[i]])
  print(plot_after[[i]])
  print(plots[[i]])
}

I suppose this has sth to do with that ggplot creates "only" a graphics object linked to the data, so the moment I print it again, it accesses the data again and fetches manipulated data instead of getting a previous "snapshot". Saving the graphs to separate lists also does not help, they are "linked" to manipulated data as well.

Is there a way to make a persistent ggplot object rather than having it linked to the data?

One could, of course create new columns with the modified data and refer to those or create a completely new dataframe, but I would like to avoid data duplication.

Elena
  • 155
  • 9

1 Answers1

4

The patchwork package helps. An option is to create a list of list of plots, and then flatten the list, and use patchwork::wrap_plots.

A more ggplot way is to avoid using vectors in aes. I've included the way I would create the aes.

update

As per your comment - you don't want to do the data modification twice, and don't want to save extra columns. Now the loop both modifies the data and creates a list of the desired plots.

library(tidyverse)
library(patchwork)

p_list <- list()
sel_col <- names(iris)[1:4]

for(i in sel_col){
  p <-  ggplot(iris, aes(!!sym(i))) +
    geom_histogram()+
    ggtitle(paste0(i,"_pre"))

  p_list[[i]][["pre"]] <- p

  iris[i] <- 3*iris[,i ]

  p2 <- ggplot(iris, aes(!!sym(i))) + 
    geom_histogram()+
    ggtitle(paste0(i,"_post"))

  p_list[[i]][["post"]] <- p2
}

head(iris) # iris has changed
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         15.3        10.5          4.2         0.6  setosa
#> 2         14.7         9.0          4.2         0.6  setosa
#> 3         14.1         9.6          3.9         0.6  setosa
#> 4         13.8         9.3          4.5         0.6  setosa
#> 5         15.0        10.8          4.2         0.6  setosa
#> 6         16.2        11.7          5.1         1.2  setosa

ls_unnest <- do.call(list, unlist(p_list, recursive=FALSE))

wrap_plots(ls_unnest, ncol = 2)

Created on 2020-05-07 by the reprex package (v0.3.0)

Helpful thread: How to flatten a list of lists?

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • I don't understand, aren't pre and post once again exactly the same? Apparently, a difference in the x axis would be expected. – Rui Barradas May 07 '20 at 09:56
  • True that. I'll look into it. – tjebo May 07 '20 at 09:56
  • @RuiBarradas and Elena, see update. Some internal referencing issue I guess – tjebo May 07 '20 at 10:03
  • Yes, that`s true, but my input data is ~1GB, so I try to avoid unnecessary enlargements of the working space. Another option, of course, is simply to create new columns with the "post"-data, but I wondered if there is a way around it, so whether one could work in a stream of same data, not creating duplicate data frames or columns :-( – Elena May 07 '20 at 10:05
  • @Elena see update - do the data transformation directly in ggplot – tjebo May 07 '20 at 10:11
  • Sure, this works, but in this case I'd need to perform the data manipulation twice (once for the plot and once to indeed commit the change), which is neither a good style nor easy to perform (if data manipulation consists actually of multiple steps). Again, this would be only a workaround. – Elena May 07 '20 at 10:17
  • This is GREAT! Though I don't get it, why the saving to 2 lists (as I tried) did not have the same effect as saving the plots in nested lists. I prefer the ggarrange method, though, to have auch pair of plots on an individual page after the main loop to get the plots: ```for (i in 1:4){ q <- ggarrange(p_list[[i]][[1]],p_list[[i]][[2]]) print(q) } ``` – Elena May 07 '20 at 11:53
  • @Elena to be fair, I don't fully understand it myself. I guess it may be how list.append works and that it is only evaluated at the end of each iteration. This is possibly avoided by the nested list approach, because you force evaluation in order to create the "i-th" list location. (?) – tjebo May 07 '20 at 12:21