1

This is a peculiar behaviour and is semi-alluded to in Create geom_vline for mean value in a density plot, for a new variable in the dataframe, without create new tables.

Plot 1: When using a computed after_stat(y) as yintercept in stat_summary with a hline geom, and one doesn't explicitly pass an x aesthetic, then this returns multiple lines that do not have any obvious relation to the data (not obvious to me, that is). The respective happens with an xintercept with a vline geom when not specifying y.

Plot 2: When hard coding a x (or y) aesthetic, this is miraculously resolved. But it defeats the point of the entire exercise, because we want to define the intercept programmatically and not depend on any hard coded values.

My question is: Why does this happen?

I don't care too much for "how to solve this". (My solution would be simply to make my own stat, but if you've got something that can fix this using stat_summary, I'd be still keen to hear).

library(ggplot2)
library(patchwork) ## just for reprex
## Multiple lines
p1 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_summary(aes(yintercept = after_stat(y)), fun = mean, geom = "hline") +
  labs(title = "Plot 1", caption = "Multiple lines -\ncorrespond to what exactly?")

## It works with a hard coded x
p2 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_summary(aes(x = 1, yintercept = after_stat(y)), fun = mean, geom = "hline") +
  labs(title = "Plot 2", caption = "A hard coded x removes those")

p1 + p2

Created on 2023-04-13 with reprex v2.0.2

P.S. I came across this problem when trying to answer How can I pass variables between ggplot personal functions?

tjebo
  • 21,977
  • 7
  • 58
  • 94

1 Answers1

1

The means and hlines we get from stat_summary correspond to the mean of the variable mapped on y per (unique) value of the variable mapped on x. This can be seen by computing the means manually. After I realized that I came up with approach to simply fix x (or y depending on the orientation) so that we have only one x value and hence get the desired overall mean of y.

library(ggplot2)
library(patchwork)
library(dplyr)

base <- ggplot(mpg, aes(displ, hwy)) +
  geom_point()

p1 <- base + stat_summary(aes(yintercept = after_stat(y), color = after_stat(factor(x))), fun = mean, geom = "hline") +
  labs(title = "Plot 1", caption = "Multiple lines -\ncorrespond to what exactly?", color = NULL)

means <- mpg |> 
  group_by(displ) |> 
  summarise(hwy = mean(hwy))

p3 <- base +
  geom_hline(data = means, aes(yintercept = hwy, color = factor(displ))) +
  labs(title = "... the multiple lines correspond to\nthe mean of y per unique x", color = NULL)

p1 + p3 +
  plot_layout(guides = "collect")

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thanks Stefan. This makes sense. Should have come to this answer myself. But sometimes it needs cleverer people for that :) – tjebo Apr 13 '23 at 18:37
  • 1
    Nothing to do with more clever. I'm so clever that I needed clever you to remind me of my own solution. :D – stefan Apr 13 '23 at 19:00