11

I'd like to add results of a Tukey.HSD post-hoc test to a ggplot2 boxplot. This SO answer contains a manual example of what I want (i.e., the letters on the plot were added manually; groups which share a letter are indistinguishable, p>whatever).

enter image description here

Is there an automatic function add letters like these to a boxplot, based on AOV and Tukey HSD post-hoc analyis?

I think it would not be too hard to write such a function. It would look something like this:

set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)

p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot() 

a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a)

# Function to generate a data frame of factor levels and corresponding labels
generate_label_df <- function(HSD, factor_levels) {
  comparisons <- rownames(HSD$l)
  p.vals <- HSD$l[ , "p adj"]

  ## Somehow create a vector of letters
  labels <- # A vector of letters, one for each factor level, generated using `comparisons` and `p.vals`
  letter_df <- data.frame(lev=factor_levels, labels=labels)
  letter_df
}

# Add the labels to the plot
p_base + 
  geom_text(data=generate_label_df(tHSD), aes(x=l, y=0, label=labels))

I realize that the TukeyHSD object has a plot method, and there is another package (which I can't now seem to find) which does what I'm describing in base graphics, but I would really prefer to do this in ggplot2.

Community
  • 1
  • 1
Drew Steen
  • 16,045
  • 12
  • 62
  • 90

1 Answers1

14

You can use 'multcompLetters' from the 'multcompView' package to generate letters of homologous groups after a Tukey HSD test. From there, it's a matter of extracting the group labels corresponding to each factor tested in the Tukey HSD, as well as the upper quantile as displayed in the boxplot in order to place the label just above this level.

library(plyr)
library(ggplot2)
library(multcompView)

set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)

a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)

generate_label_df <- function(HSD, flev){
 # Extract labels and factor levels from Tukey post-hoc 
 Tukey.levels <- HSD[[flev]][,4]
 Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
 plot.labels <- names(Tukey.labels[['Letters']])

 # Get highest quantile for Tukey's 5 number summary and add a bit of space to buffer between    
 # upper quantile and label placement
    boxplot.df <- ddply(d, flev, function (x) max(fivenum(x$y)) + 0.2)

 # Create a data frame out of the factor levels and Tukey's homogenous group letters
  plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
     stringsAsFactors = FALSE)

 # Merge it with the labels
   labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)

return(labels.df)
}

Generate ggplot

 p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot() +
  geom_text(data = generate_label_df(tHSD, 'lev'), aes(x = plot.labels, y = V1, label = labels))

Boxplot with automatic Tukey HSD group label placement

  • I am trying to use this function, but I have found a few errors in the `generate_label_df` function. First, I think a third variable `d` needs to be added, which is the original data. Second, the V1 column is generating `NA`s. – Skiea Nov 07 '20 at 23:43
  • How can you draw boxplots and letters with same color? – Darwin PC Oct 14 '22 at 03:19