1

I am trying to create some functions based on ggplot to create custom plots. The creation of functions is very easy and straight forward, but I cannot figure out the way to share information between the functions to help in the plot creation.

Here is a simplified example

library(ggplot2)

my_ggplot <- function(dat, x, y, cols) {
  ggplot(dat, aes(!!sym(x), !!sym(y))) + 
    geom_point(color = cols$dots)
}


my_geom <- function(dat, x, cols) {
  xmean <- mean(dat[[x]], na.rm = T)
  exit <- list(
    geom_smooth(aes(color = cols$line), method = "loess", se = FALSE),
    geom_vline(xintercept = xmean, color = cols$line)
  )
}


mycolors <- list(dots = "blue", line = "red")

Here, my_plot creates the base plot and, if I want to, I can add couple of lines to it using my_geom. I need a way to control the colors so, I have defined an S3 class object, which in this example is simply the list mycolors.

So, when passing all the parameters to each function, the result is perfectly fine:

my_ggplot(mpg, 'displ', 'hwy', mycolors) +
  my_geom(mpg, "displ",  mycolors)

But I want to be able to "inherit" values from my_ggplot to my_geom so that the following code could work:

my_ggplot(mpg, 'displ', 'hwy', mycolors) +
  my_geom()

But still, my_geom keeps certain level of independence in case I want to use it with different ggplot() functions. Especially important for me is to be able to pass the dataset between functions, in the example I calculate the mean and use it later in geom_vline to keep it simple, but in practice I need to do some data wrangling and calculations before I can pass the values to the geom.

bretauv
  • 7,756
  • 2
  • 20
  • 57
teoten
  • 31
  • 1
  • 6

2 Answers2

2

Another option. This might work by defining your data and color arguments as NULL, and with a simple if/else statement to create a list based on presence of provided data, respectively. It really depends on the use case. In my example, there are two if else statements. One for the data, the other for the color (in case the data was not passed to the second function).

It might be best to create your own stat, it really depends on what type of data transformation and geometry you have in mind. geom_vline is a bit of a special situation and might not be the best chosen example.

The advantage of this little bit of extra effort is that it doesn’t need a hard coded y aesthetic for your line.

I think Stefan's approach with the color is excellent - I've used this here too.

library(ggplot2)

my_ggplot <- function(dat, x, y, cols) {
  ggplot(dat, aes(x = !!sym(x), y = !!sym(y))) +
    geom_point(aes(color = "dots"), show.legend = F) +
    scale_color_manual(values = cols) 
}

StatMyline <- ggproto("StatMyline", Stat,
  compute_group = function(data, scales) {
    data.frame(
      x = mean(data$x),
      xend = mean(data$x),
      y = -Inf, yend = Inf
    )
  },
  required_aes = c("x", "y")
)
stat_myline <- function(mapping = NULL, data = NULL, geom = "segment",
                        position = "identity", na.rm = FALSE, show.legend = NA,
                        inherit.aes = TRUE, ...) {
  layer(
    stat = StatMyline, data = data, mapping = mapping, geom = geom,
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

mycolors <- list(dots = "blue", line = "red")

my_geom <- function(dat = NULL, x, cols = NULL) {
  ## if dat is provided, compute using your provided data and the provided color
  if (!is.null(dat)) {
    xmean <- mean(dat[[x]], na.rm = T)
    list(
      geom_smooth(aes(color = "line"), method = "loess", se = FALSE, show.legend = F),
      geom_vline(aes(color = "line", xintercept = xmean), show.legend = F)
    )
  } else {
    list(
      geom_smooth(method = "loess", se = FALSE, aes(color = "line"), show.legend = F),
      stat_myline(aes(color = "line"), show.legend = F), 
      if(!is.null(cols)) scale_color_manual(values = cols) else NULL
    )
  }
}

p1 <- my_ggplot(mpg, "displ", "hwy", mycolors) +
  my_geom(mpg, "displ", mycolors) +
  ggtitle("With data + color ")

p2 <- my_ggplot(mpg, "displ", "hwy", mycolors) +
  my_geom() +
  ggtitle("Inheriting data + color")

p3 <- ggplot(mtcars, aes(hp, mpg)) +
  geom_point() +
  my_geom(cols = mycolors) +
  ggtitle("without my_ggplot")

library(patchwork)
p1 + p2 + p3
#> `geom_smooth()` using formula = 'y ~ x'
#> `geom_smooth()` using formula = 'y ~ x'
#> `geom_smooth()` using formula = 'y ~ x'

Created on 2023-04-13 with reprex v2.0.2

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Your approach is really helpful. I'm not sure that `scale_color_manual` would work for all my cases, I'll experiment with it. But I like the way of passing the data. I still don't get fully the `ggproto` usage, I need to study it more, but if I understood well, `StatMyline` is an object that holds the data and the values for `x` and `y`, which then are called by `stat_my_line` to create the segment specified in `layer` instead of `geom_vline`, am I correct? And `StatMyline` take its value from any `ggplot` or `geom_` called before `stat_myline`? – teoten Apr 14 '23 at 07:22
  • @teoten It can be really confusing- I admit that I don't fully understand ggproto either. A `Stat` is a ggproto object where your data transformation happens (like computing a mean from data). This is then used by any "Geom" how to draw it. The most simple stat is "StatIdentity" which just returns the data. the `stat_...` function is then indeed to create a layer in your ggplot object, but you could also create a `geom_...` function, geom_ and stat_ are often quite identical in their usage. – tjebo Apr 16 '23 at 15:32
  • I am using a segment geometry with this stat, so in the data transformation (done in `compute_group`) you will need to return a data frame that has the required aesthetics for this geom - x/xend/y/yend. – tjebo Apr 16 '23 at 15:33
1

One possible approach to remove the dependency on the dat and the x argument would be to use stat_summary to compute the mean of the variable mapped on the x aes and to add the vline similar to my answer on this post. Second, for the colors one option would be to map on the color aes and to set the color palette via scale_color_manual. This way the colors would be available in my_geom too. Of course does this only work when you create your plot via my_ggplot. Not perfect.

library(ggplot2)

my_ggplot <- function(dat, x, y, cols) {
  ggplot(dat, aes(!!sym(x), !!sym(y))) +
    geom_point(aes(color = "dots"), show.legend = FALSE) +
    scale_color_manual(values = cols)
}

my_geom <- function() {
  list(
    geom_smooth(aes(color = "line"), method = "loess", se = FALSE, show.legend = FALSE),
    stat_summary(aes(xintercept = after_stat(x), y = 0, color = "line"),
      fun = mean, geom = "vline", orientation = "y", show.legend = FALSE
    )
  )
}

mycolors <- list(dots = "blue", line = "red")

my_ggplot(mpg, "displ", "hwy", mycolors) +
  my_geom()
#> `geom_smooth()` using formula = 'y ~ x'

Finally here is an example of applying my_geom to a ggplot created from scratch:

ggplot(mtcars, aes(hp, mpg)) +
  geom_point() +
  my_geom()
#> `geom_smooth()` using formula = 'y ~ x'

stefan
  • 90,330
  • 6
  • 25
  • 51
  • I saw your other thread on the search for a solution (+1 here and there) and it's such a shame for the need of a hard coded y. Not using it resulting in many lines is such an odd behaviour. might ask a question about this – tjebo Apr 13 '23 at 16:18
  • 1
    @tjebo Haha. Then actually you deserve the credit for this answer. Already had a look at this one hours ago but ended up with multiple vlines and couldn't remember that I already had figured out a solution in the past. Only your upvote reminded me of that. :D – stefan Apr 13 '23 at 16:39
  • 1
    Made my day! Excellent approach with the color btw (and I've cheekily used it in my approach too, don't see another way, really) – tjebo Apr 13 '23 at 16:45
  • Nice workaround, but it wouldn't work for my case. As I mentioned in my question <> Therefore, exchanging `geom_vline` for `stat_summary` would work for this particular case. I don't want to remove independence to `dat` and `y` but rather pass them to the next function, if needed. – teoten Apr 14 '23 at 07:02
  • 1
    I see. I think you are better off creating a customized `geom` and/or `stat` as in the approach by tjebo. This way you automatically have access to the data and the variables mapped on aesthetics and could do your data wrangling much more easily. BTW: Passing the data is not the tricky part, i.e. via the `data` argument you can always do `data = ~ my_fun(.x))`. The tricky part is to get rid of the `x` argument, i.e. passing the column names mapped on the aesthetics. – stefan Apr 14 '23 at 07:18
  • Yes, it is what I am noticing in the function from tjebo. I will have to see what works better for me, maybe having to specify ONLY the aes is not so bad at the end, if I can avoid passing other arguments. But I will definetly experiment with the if elses he is suggesting first – teoten Apr 14 '23 at 09:20