2

This might seem like a duplicate of this question, but in fact I want to expand the original question.

I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median)

My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):Annotes are centering in the middle of the group rather than plotting on the appropriate boxplot

Community
  • 1
  • 1
Ratnanil
  • 1,641
  • 17
  • 43

2 Answers2

3

is it what you want?

require(ggplot2)

give.n <- function(x){
  return(c(y = median(x)*1.05, label = length(x))) 
  # experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

enter image description here

MLavoie
  • 9,671
  • 41
  • 36
  • 56
  • exactly what I needed, thank you! Can I post a followup question? Please request deletion of this comment if this is bad practice. Otherwise: Using a factor to determin the position of the text in relation to the group median leads to unwanted behaviour if the values are far apart. Using a fixed value (e.g. median(x)+5) makes the function only usable for one range of values. Is there a way to determin the y value of the text within the stat_summary() command? – Ratnanil Jan 17 '16 at 15:16
  • thanks for accepting the answer !to your comment, I am not sure, but I am personally more manual, but if you want to put label exactly where you want geom_text() is probably the best option – MLavoie Jan 17 '16 at 15:59
  • Alternative: instead of using a Multiple of the median per subgroup is there a way to access the median of the whole dataset and use a multiple of that value? – Ratnanil Jan 19 '16 at 09:00
  • I think it's possible; take a look at this http://docs.ggplot2.org/dev/geom_boxplot.html and scroll down, you will see an example where you can draw a boxplot with your own computations – MLavoie Jan 19 '16 at 10:02
0

In case anyone else is having trouble positioning the text at suitable locations, here is my modification to the answer from @MLavoie :

require(ggplot2)

give.n <- function(x){
  
  # Calculate the third quantile (q3) and the distance between the median and
  # q3:
  q3 <- quantile( x, probs = c(0.75), names = F )
  distance_between_median_and_q3 <- ( q3 - median(x))
  
  # If the distance between the median and 3rd quartile are large enough, place
  # text halfway between the median and 3rd quartile:
  if( distance_between_median_and_q3 > 0.8 ){
    return( c( 
      y = median(x) + (q3 - median(x))/2
      , label = length(x) )) 
  } else{
    # If the distance is too small, either:
    
    # 1) place text above upper whisker *as long as*  IQR = 0,
    if(IQR(x) > 0 ){
      upper_whisker <- max( x[ x < (q3 + 1.5 * IQR(x)) ])
      
      return( c( 
        y = upper_whisker * 1.03
        , label = length(x) )) 
    } else{
      # or 
      # 2) place text above median
      return( c( 
        y = median(x) * 1.03
        , label = length(x) )) 
    }
  }
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
  geom_boxplot() +
  stat_summary( fun.data = give.n
                , geom = "text"
                # , fun.y = median
                , position = position_dodge( width = 0.75 ) 
  )

Please note that you might have to experiment with some of the values or code in the give.n function to get it to work with your data. But as you can see, it is possible to make give.n quite flexible.

enter image description here

E. Nygaard
  • 104
  • 6