26

I am doing a basic boxplot where y=age and x=Patient groups

age <- ggplot(data, aes(factor(group2), age))  + ylim(15, 80) 
age + geom_boxplot(fill = "grey80", colour = "#3366FF")

I was hoping you could help me out with a few things:

1) Is it possible to include a number of observations per group above each group boxplot (but NOT on the X axis where my group labels are) without having to do this in paint :)? I have tried using:

age + annotate("text", x = "CON", y = 60, label = "25")

where CON is the 1st group and y = 60 is ~ just above the boxplot for this group. However, the command didn't work. I assume it has something to do that it reads x as a continuous rather than a categorical variable.

2) Also although there are plenty of questions about using the mean rather than the median for the boxplots, I still haven`t found a code that works for me?

3) On the same matter is there a way you could include the mean group stat in the boxplot? Perhaps using

age + stat_summary(fun.y=mean, colour="red", geom="point")

which however only includes a dot of where the mean lies. Or again using

age + annotate("text", x = "CON", y = 30, label = "30")

where CON is the 1st group and y = 30 is ~ the group age mean. Knowing how flexible and rich ggplot2 syntax is I was hoping that there is a more elegant way of using the real stats output rather than annotate.

Any suggestions/links would be much appreciated!

Thanks!!

Ben
  • 41,615
  • 18
  • 132
  • 227
user1442363
  • 800
  • 1
  • 10
  • 18
  • A boxplot normally has min, lower, middle and upper quantiles and finally a max value. You already have the .25, .5 and .75 quantiles. Isn't this informative enough? – Arun Mar 27 '13 at 14:23
  • This is the format I am asked for. – user1442363 Mar 27 '13 at 14:29

3 Answers3

38

Is this anything like what you're after? With stat_summary, as requested:

# function for number of observations 
give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

# function for mean labels
mean.n <- function(x){
  return(c(y = median(x)*0.97, label = round(mean(x),2))) 
  # experiment with the multiplier to find the perfect position
}

# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
  geom_boxplot(fill = "grey80", colour = "#3366FF") +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
  stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")

Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes enter image description here

hat-tip: https://stackoverflow.com/a/3483657/1036500

Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227
  • 3
    For a variation on this answer that includes how to annotate with `n = 11`, etc., see here: http://stackoverflow.com/a/15720769/1036500 – Ben Mar 30 '13 at 16:35
18

I think this is what you're looking for maybe?

myboxplot <- ddply(mtcars,
                    .(cyl),
                    summarise,
                    min = min(mpg),
                    q1 = quantile(mpg,0.25),
                    med = median(mpg),
                    q3 = quantile(mpg,0.75),
                    max= max(mpg),
                    lab = length(cyl))
ggplot(myboxplot, aes(x = factor(cyl))) + 
    geom_boxplot(aes(lower = q1, upper = q3, middle = med, ymin = min, ymax = max), stat = "identity") + 
    geom_text(aes(y = max,label = lab),vjust = 0)

enter image description here

I just realized I mistakenly used the median when you were asking about the mean, but you can obviously use whatever function for the middle aesthetic you please.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Sorry, one final question. Would it be possible to change the order of the groups? Unfortunately I am not interested in a numeric or data driven order. The only way of doing it I can think of is recoding the group variable. Your help will be much appreciated! thanks again! – user1442363 Mar 27 '13 at 14:41
4

Answer to the first problem. To show value above the box you should provide x values as numeric not as level names. So, to plot the value above first value give x=1.

data(ToothGrowth)
ggplot(ToothGrowth,aes(supp,len))+geom_boxplot()+
   annotate("text",x=1,y=32,label=30)
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • Hi! Great, thanks. I actually tried both numeric/level initially but for some reason nothing worked. Now it`s fine, thanks a lot. – user1442363 Mar 27 '13 at 14:20
  • The annotate command is perfect to fix positioning issues! Thanks so much – Mac May 14 '16 at 13:24