3

I'm trying to create boxplots with descriptive information (mean, count, etc.). I found a lot of examples of how to add the numbers for one boxplot with different groups, but I didn't found a way to add those numbers for multiple boxplots grid (facet_wrap).

for example, this article describes how to add numbers for one boxplot - I'm trying to do the same for multiple boxplots

library(reshape2)
library(ggplot2)
df.m <- melt(iris, id.var = "Species")
p <- ggplot(data = df.m, aes(x=variable, y=value)) + 
  geom_boxplot(aes(fill=Species))
p + facet_wrap( ~ variable, scales="free")

enter image description here

and on top of this plot - I want to add the relevant descriptive information on top of each box.

LHA
  • 51
  • 5
  • Can you add a reproducible example? I might know what's going on – Jeff Bezos May 04 '20 at 14:58
  • This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `dput(head(x))` or `data.frame(x=...,y=...)`), and intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans May 04 '20 at 15:53

1 Answers1

6

Create the function that makes counts and means

stat_box_data <- function(y) {
  return( 
    data.frame(
      y = 0.5+1.1*max(y),  #may need to modify this depending on your data
      label = paste('count =', length(y), '\n',
                    'mean =', round(mean(y), 1), '\n')
    )
  )
}

  )
}
df.m <- melt(iris, id.var = "Species")

You may want to use this or something similar if you have large outliers instead of the y=0.5... bit above:

y=quantile(y,probs=0.95)*1.1,

Plot the data and use stat_summary with your custom function

ggplot(data = df.m, aes(x=Species, y=value)) + 
  geom_boxplot(aes(fill=Species))+
  stat_summary(
    fun.data = stat_box_data, 
    geom = "text", 
    hjust = 0.5,
    vjust = 0.9
  ) + 
facet_wrap( ~ variable, scales="free")

enter image description here

CrunchyTopping
  • 803
  • 7
  • 17
  • `upper_limit = max(iris$Sepal.Length)` is constant variable. there is any way to define a dynamic place for each layer? in the Iris dataset, it's looking good, but in more complicated tables this can be a problem. – LHA May 05 '20 at 07:27
  • Edited for larger variability in the y variable, please accept the answer if that solves the problem – CrunchyTopping May 05 '20 at 12:09
  • Thanks! that perfect. – LHA May 05 '20 at 12:35