4

I'm trying to plot boxplot with ggplot2. I want to change the middle to mean.

I know people have asked similar questions before, but I'm asking this because the solution didn't work for me. Specifically I followed the first solution in this accepted answer

This is what I did with mpg test data:

library(ggplot2)
library(tidyverse)

mpg %>%
  ggplot(aes(x = class, y = cty, middle = mean(cty))) +
  geom_boxplot()

It has no effect.

graph plotting mean: enter image description here

graph plotting with default median: enter image description here

Can anyone help to point out what I did wrong? Thanks.

EJAg
  • 3,210
  • 2
  • 14
  • 22
  • Actually if you look closely at the answer that you linked, the `middle = mean()` call is inside the `geom_boxplot(aes())` call not the `ggplot` call. Try shifting it to the `boxplot` call as I don't think the `middle` specification exists for `ggplot` itself. – bob1 Oct 30 '18 at 02:07
  • `aes` can be called inside `geom_boxplot` or `ggplot`(it's shared by all layouts in this case). I've tried both though it really shouldn't make a difference, and it didn't. – EJAg Oct 30 '18 at 02:11
  • another option is that the mean and median are very similar in all cases. Try this: tapply(mpg$cty, mpg$class, summary) – bob1 Oct 30 '18 at 02:26
  • @bob1 You're right that most of them are quite close. But at least one class, subcompact, has a very different mean(20.37) than median(19). I tried to use other functions (e.g. max) and none has any effect. Start to think the linked solution is wrong. Any idea what would be a solution without creating a new df just for this? – EJAg Oct 30 '18 at 22:54
  • @EJ2015 If you are still looking for solutions, I just answered a somewhat similar question [here](https://stackoverflow.com/questions/53684585/override-lower-upper-etc-in-boxplot-while-grouping/53699288). Solution 2 might be modified to suit your use case. – Z.Lin Dec 11 '18 at 06:44

2 Answers2

4

Messing around with another dataset, mtcars, shows the same thing, defining middle doesn't change it. And that one has some larger differences in mean to median. Another option is using stat_summary, although I can't get the points function to work just right, and had to tweak it to not get a arguments imply differing number of rows: 1, 0 error.

BoxMeanQuant <- function(x) {
    v <- c(min(x), quantile(x, 0.25), mean(x), quantile(x, 0.75), max(x))
    names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
    v
  }

mpg %>%
  ggplot(aes(x = class, y = cty)) +
  stat_summary(fun.data = BoxMeanQuant, geom = "boxplot")

Compared to the normal geom_boxplot, which is not using the defined middle.

mpg %>% 
  ggplot(aes(x = class, y = cty)) +
  geom_boxplot(aes(middle = mean(cty)))

This is what I was using to plot the outliers as points, but they're different from whatever the default for geom_boxplot is. You can adjust as necessary. Without using the if-else it would throw an error.

BoxMeanQuant <- function(x) {
  v <- c(quantile(x, 0.1), quantile(x, 0.25), mean(x), quantile(x, 0.75), quantile(x, 0.9))
  names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
  v
}

outliers <- function(x) {
  if (length(x) > 5) {
  subset(x, x < quantile(x, 0.1) | quantile(x, 0.9) < x)
  } else {
    return(NA)
  }
}

ggplot(data = mpg, aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot") +
stat_summary(fun.y = outliers, geom = "point")

Anonymous coward
  • 2,061
  • 1
  • 16
  • 29
2

In the end I had to create a summary df to do this. It is not what I was originally looking for, but it works.

df <- mpg %>%
  group_by(class) %>%
  summarize(ymin = min(cty), ymax = max(cty), lower = quantile(cty, 0.25), upper = quantile(cty, 0.75), middle = mean(cty)) 

df %>%
  ggplot(aes(class)) +
  geom_boxplot(aes(ymin = ymin, ymax = ymax, lower = lower, upper = upper, middle = middle), stat = 'identity')
EJAg
  • 3,210
  • 2
  • 14
  • 22