Messing around with another dataset, mtcars
, shows the same thing, defining middle
doesn't change it. And that one has some larger differences in mean to median. Another option is using stat_summary
, although I can't get the points function to work just right, and had to tweak it to not get a arguments imply differing number of rows: 1, 0
error.
BoxMeanQuant <- function(x) {
v <- c(min(x), quantile(x, 0.25), mean(x), quantile(x, 0.75), max(x))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
mpg %>%
ggplot(aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot")

Compared to the normal geom_boxplot
, which is not using the defined middle
.
mpg %>%
ggplot(aes(x = class, y = cty)) +
geom_boxplot(aes(middle = mean(cty)))

This is what I was using to plot the outliers as points, but they're different from whatever the default for geom_boxplot
is. You can adjust as necessary. Without using the if-else
it would throw an error.
BoxMeanQuant <- function(x) {
v <- c(quantile(x, 0.1), quantile(x, 0.25), mean(x), quantile(x, 0.75), quantile(x, 0.9))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
outliers <- function(x) {
if (length(x) > 5) {
subset(x, x < quantile(x, 0.1) | quantile(x, 0.9) < x)
} else {
return(NA)
}
}
ggplot(data = mpg, aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot") +
stat_summary(fun.y = outliers, geom = "point")
