1

I am using ggplot and gridExtra to make two plots side by side with different data, and I'm observing an unexpected behavior when using vector to make the plot instead of a dataframe.

Here is MWE with my problem:

library(ggplot2)
library(dplyr)
library(gridExtra)

cases <- c(1, 2)

df <- data.frame(
  case=cases,
  y1=c(1, 2),
  y2=c(2, 4),
  y3=c(3, 8),
  y4=c(4, 16),
  y5=c(5, 32)
)

x <- c(1, 2, 3, 4, 5)

plot_list <- list()
for(caso in cases){
  data <- df %>% filter(case == caso)
  y <- data %>% dplyr::select(starts_with('y')) %>% unlist(use.name=FALSE)
  dd <- data.frame(xdf=x, ydf=y)
  graph <- (
    ggplot()
    + geom_line(data=dd, aes(x=xdf, y=ydf))
    ## + geom_point(data=dd, aes(x=xdf, y=ydf)) # this line works
    + geom_point(aes(x=x, y=y)) # this line doesn't
  )
  plot_list[[length(plot_list)+1]] <- graph
}

grid.arrange(grobs=plot_list, ncol=2)

This code makes a plot with a line on the left and a parabola on the right. I marked two lines that call geom_point. If I use the line with the dataframe, everything works as expected. However, if I use the line with the vectors (that were actually used to create the dataframe), than the points of the parabola are plotted in all the graphs.

Here is the resulting figure:

Clearly, the problem is solved by using dataframes instead of vectors, but I wanted to understand why this behavior is happening in the first place. So I'd appreciate any insight of why R is behaving in such seemingly counter-intuitive (at least for me) way.

tjebo
  • 21,977
  • 7
  • 58
  • 94
pgaluzio
  • 168
  • 2
  • 6

1 Answers1

1

Interesting finding. This is because you're using a for loop and they have also to me often enough difficult to understand behaviour regarding object creation and evaluation. In your case, ggplot doesn't draw the plots until the last end, and then the last vector 'y' is used for the plot. I find the easiest way to avoid this problem is using another way to loop instead. I prefer the apply family.

That said - my advice is to avoid using vectors in aes() - this only causes headaches.

I just found this thread which explains the problem much better. Suggest closing this question as a duplicate. "for" loop only adds the final ggplot layer

library(ggplot2)
library(dplyr)

df <- data.frame( case=1:2, y1=c(1, 2), y2=c(2, 4), y3=c(3, 8), y4=c(4, 16), y5=c(5, 32))

x <- 1:5

plot_list <- lapply(1:2, function(i){
  data <- df %>% dplyr::filter(case == i)
  y <- data %>% dplyr::select(starts_with('y')) %>% unlist(use.name=FALSE)
  graph <- ggplot() + 
    geom_point(aes(x=x, y=y)) 
  graph
})

gridExtra::grid.arrange(grobs=plot_list, ncol=2)

Created on 2022-02-08 by the reprex package (v2.0.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94