0

I would like to plot the following data frame as a boxplot:

df <- structure(list(gender = c("M", "M", "F", "F", "F", "M", "M", 
"M", "F", "F", "M", "M", "F", "F", "M"), age = c(0.047843262001096, 
-0.742811587141869, 0.925285031087175, 0.0921270156895479, -0.869460001218705, 
-0.468079587412729, -0.480948831743091, 0.879330955606316, -0.249821527515907, 
0.386670185484268, 0.670264658324484, 0.382273448950274, 0.0500787926732772, 
2.39384630378084, 0.862479212110272)), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))

I would like to replace the point with a horizontal dashed line:

ggplot(df, aes(x = gender, y = age)) + geom_boxplot() + 
stat_summary(fun.data = mean_sdl, geom = "point", color = "red") 

I have tried "line" instead of "point" but doesnt seem to produce anything. Any ideas?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Omry Atia
  • 2,411
  • 2
  • 14
  • 27
  • Why do you want such a line? You can try `geom_hline()`..Can't test, no access to R. – NelsonGon Nov 24 '19 at 13:49
  • I want to mark the mean which can be very different from the median. geom_hline doesn't fit the syntax here. – Omry Atia Nov 24 '19 at 13:51
  • What do you mean `geom_hline` doesn't fit the syntax? You can feed the data via `aes`..However, there must be a better and more conventional way to present your data. Can't get what you mean by mean being different from the median and how that relates to `geom_hline`. Could you add a sample of your expected plot? – NelsonGon Nov 24 '19 at 13:54
  • Does this answer your question? [Joining means on a boxplot with a line (ggplot2)](https://stackoverflow.com/questions/3989987/joining-means-on-a-boxplot-with-a-line-ggplot2) – Cole Nov 24 '19 at 14:00

3 Answers3

3

Edited based on comment and used entirely with stat_summary if so desired. Can also use fun.data if desired.

ggplot(df, aes(x = gender, y = age, yintercept = mean(age)) +
  geom_boxplot() +
  stat_summary(fun.y = mean, color = "yellow", size = 3, geom = "hline", linetype = 2)

Since you've shown you want the line for each boxplot, you can do the below entirely within ggplot without the need of defining an additional function.

ggplot(df, aes(x = gender, y = age, width = 0.75)) +
  geom_boxplot() +
  stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..), linetype = 2)

graph example with mean dashes

caldwellst
  • 5,719
  • 6
  • 22
  • Better not to use `$` in `ggplot2`: Try: `geom_hline(aes(yintercept = mean(age)), linetype =2)`..Might lead to different results especially for grouped data. – NelsonGon Nov 24 '19 at 13:59
3

You could use base boxplot() and arrows() with length=0 of the arrow heads. The boxes' borders appear to be in .4 distance from the centers. To get the means use aggregate().

df <- as.data.frame(df)
boxplot(age ~ gender, df)
a <- aggregate(age ~ gender, df, mean)
arrows(1 - .4, a[1, 2], 1 + .4, length=0, lty=2, lwd=2, col=2)
arrows(2 - .4, a[2, 2], 2 + .4, length=0, lty=2, lwd=2, col=2)

enter image description here

jay.sf
  • 60,139
  • 8
  • 53
  • 110
3

In stat_summary use the Mean function shown below with the boxplot geom. Also set width to ensure that both geoms have the same width.

Mean <-  function(x) {
  setNames(rep(mean(x), 5), c("ymin", "lower", "middle", "upper", "ymax"))
}

ggplot(df, aes(x = gender, y = age, width = 0.75)) +
   geom_boxplot() +
   stat_summary(fun.data = Mean, geom = "boxplot", linetype = "dashed")

enter image description here

ADDED:

In his answer, @caldwellst shows that the errorbar geom could be used and that would also work in the code above by simply replacing geom="boxplot" with geom="errorbar" in stat_summary. That does have the advantage over geom="boxplot" of producing nicer looking dashed lines.

Although Mean as defined above would work as is with errorbar we really only need the ymin and ymax components that Mean above outputs so we could reduce it as shown:

Mean <- function(x) c(ymin = mean(x), ymax = mean(x))

ggplot(df, aes(x = gender, y = age, width = 0.75)) +
   geom_boxplot() +
   stat_summary(fun.data = Mean, geom = "errorbar", linetype = "dashed")

This could also be written like this replacing Mean with the indicated formula which fn$ from gsubfn will convert to a function whose body is the right hand side of the formula.

library(gsubfn)

ggplot(df, aes(x = gender, y = age, width = 0.75)) +
   geom_boxplot() +
   fn$stat_summary(fun.data = ~ c(ymin = mean(x), ymax = mean(x)), 
     geom = "errorbar", linetype = "dashed")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341