1

I am trying to add sample size to boxplots (preferably at the top or bottom of them) that are grouped by two levels. I used the facet_grid() function to produce a panel plot. I then tried to use the annotate() function to add the sample sizes, however this couldn't work because it repeated the values in the second panel. Is there a simple way to do this?

head(FeatherData, n=10)
    Location   Status   FeatherD               Species        ID
## 1        TX Resident  -27.41495         Carolina wren CARW (32)
## 2        TX Resident  -29.17626         Carolina wren CARW (32)
## 3        TX Resident  -31.08070         Carolina wren CARW (32)
## 4        TX Migrant  -169.19579 Yellow-rumped warbler YRWA (28)
## 5        TX Migrant  -170.42079 Yellow-rumped warbler YRWA (28)
## 6        TX Migrant  -158.66925 Yellow-rumped warbler YRWA (28)
## 7        TX Migrant  -165.55278 Yellow-rumped warbler YRWA (28)
## 8        TX Migrant  -170.43374 Yellow-rumped warbler YRWA (28)
## 9        TX Migrant  -170.21801 Yellow-rumped warbler YRWA (28)
## 10       TX Migrant  -184.45871 Yellow-rumped warbler YRWA (28)


 ggplot(FeatherData, aes(x = Location, y = FeatherD)) +
   geom_boxplot(alpha = 0.7, fill='#A4A4A4') +
   scale_y_continuous() +
   scale_x_discrete(name = "Location") +
   theme_bw() +
   theme(plot.title = element_text(size = 20, family = "Times", face = 
 "bold"),
         text = element_text(size = 20, family = "Times"),
         axis.title = element_text(face="bold"),
         axis.text.x=element_text(size = 15)) +
   ylab(expression(Feather~delta^2~H["f"]~"‰")) +
   facet_grid(. ~ Status)

enter image description here

Brian
  • 7,900
  • 1
  • 27
  • 41
AMaldonado
  • 11
  • 1
  • 1
  • 3

1 Answers1

5

There's multiple ways to do this sort of task. The most flexible way is to compute your statistic outside the plotting call as a separate dataframe and use it as its own layer:

library(dplyr)
library(ggplot2)

cw_summary <- ChickWeight %>% 
  group_by(Diet) %>% 
  tally()

cw_summary
# A tibble: 4 x 2
    Diet     n
  <fctr> <int>
1      1   220
2      2   120
3      3   120
4      4   118
ggplot(ChickWeight, aes(Diet, weight)) + 
  geom_boxplot() +
  facet_grid(~Diet) +
  geom_text(data = cw_summary,
            aes(Diet, Inf, label = n), vjust = 1)

enter image description here

The other method is to use the summary functions built in, but that can be fiddly. Here's an example:

ggplot(ChickWeight, aes(Diet, weight)) + 
  geom_boxplot() +
  stat_summary(fun.y = median, fun.ymax = length,
               geom = "text", aes(label = ..ymax..), vjust = -1) +
  facet_grid(~Diet)

enter image description here

Here I used fun.y to position the summary at the median of the y values, and used fun.ymax to compute an internal variable called ..ymax.. with the function length (which just counts the number of observations).

Brian
  • 7,900
  • 1
  • 27
  • 41