Significance lines in box plot

Question

I'm trying to add significance lines in a box plot. I used the solutions from here and here. So I have this code:

ggplot(treatmentonly, aes(y = BagChange_pr, group = B, x = B )) +
  stat_boxplot (geom="errorbar", width = 0.5) +
  geom_boxplot (fill=c("#d55e00", "#cc79a7", "#0072b2", "#f0e442", "#009e73"), 
                outlier.size = 2, outlier.shape = 1) +
  geom_line (aes(x = c(1, 1:4, 4), 
                 y = c(10, 11, 11, 11, 11, 10)), 
             inherit.aes = FALSE )+
  annotate("text", x = 2.5, y = 11.5, label = "*", size = 8) +
  labs(x="Block", y="Envelope Weight Change [%]", 
       title = "B") +
  theme(plot.title = element_text(hjust = 0.5))+
  coord_cartesian(ylim = c(-4, 12))

But I get the following error: "Aesthetics must be either length 1 or the same as the data (40): x and y"

I don't understand what I did differently to the examples in the links, that would cause the problem. Who can help?

EDIT: Here is the data collected with dput(). I hope this is the right format, I'm sorry, I'm new to this.

structure(list(B = c("1", "1", "1", "1", "2", "2", "2", "2", 
"3", "3", "3", "3", "4", "4", "4", "4", "5", "5", "5", "5", "1", 
"1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3", "4", "4", 
"4", "4", "5", "5", "5", "5"), BagChange_pr = c(-1.28888888888891, 
-0.0444444444444025, -0.355555555555576, -0.0444444444444025, 
9.82222222222216, 3.68888888888887, -0.355555555555576, -3.4666666666667, 
1.20000000000005, 3.77777777777775, 1.55555555555562, -0.488888888888894, 
3.95555555555556, 4.3555555555556, 2.31111111111109, 4.53333333333332, 
1.55555555555562, -1.28888888888891, -0.400000000000023, 2.31111111111109, 
-1.33333333333334, -1.37777777777779, 2.66666666666671, 3.19999999999998, 
3.3333333333333, 0.488888888888916, 1.33333333333332, 5.82222222222226, 
NA, 5.33333333333337, 3.28888888888892, -1.42222222222218, 4.17777777777775, 
7.91111111111107, 6.53333333333335, 7.06666666666662, 7.24444444444448, 
6.35555555555558, 1.55555555555562, 5.95555555555554)), row.names = c(NA, 
-40L), class = c("tbl_df", "tbl", "data.frame"))

As you see B is not numeric although it's numbers. That's because this is just a subset of the data. In another subset there are actually characters, and I'll have to do the same chart with this subset later.

Thanks for the link. I edited the result of the dput() function in the question. — Paul, Apr 13 '21 at 10:36

score 1 · Accepted Answer · answered Apr 13 '21 at 10:45

you can try the ggsignif package

ggplot(df, aes(B, BagChange_pr, fill =B)) + 
  geom_boxplot() + 
  scale_fill_manual(values = c("#d55e00", "#cc79a7", "#0072b2", "#f0e442", "#009e73")) + 
  ggsignif::geom_signif(annotations ="*", y_position = c(11), xmin = c(2), xmax =c(3))

A more generalized approach using e.g. a t.test

ggplot(df, aes(B, BagChange_pr, fill =B)) + 
  geom_boxplot() + 
  scale_fill_manual(values = c("#d55e00", "#cc79a7", "#0072b2", "#f0e442", "#009e73")) + 
  ggsignif::geom_signif(comparisons = list(c("2", "3"), c("4","5"), c("2", "4")),
                        step_increase = 0.1,  
                        test = "t.test")

Thank you so much! I tried with the ggsignif package before, but it did only work now with your code. One additional question: Can I change size of the * within this function? If not I'll just draw the line and add the * with another code line for annotation. — Paul, Apr 13 '21 at 10:52

score 0 · Answer 2 · answered Apr 13 '21 at 10:53

I suggest to use ggpubr. You can tweak as you prefer.

library(ggpubr)
my_comparisons <- list( c("1", "2"), c("2", "3"), c("3", "4") )
ggboxplot(df, x = "B", y = "BagChange_pr",
          color = "B", palette = "jco")+
  stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 20)     # Add global p-value

Or:

# Visualize the expression profile
ggboxplot(df, x = "B", y = "BagChange_pr", color = "B", 
          add = "jitter", legend = "none") +
  rotate_x_text(angle = 45)+
  geom_hline(yintercept = mean(df$BagChange_pr), linetype = 2)+ # Add horizontal line at base mean
  stat_compare_means(method = "anova", label.y = 15)+        # Add global annova p-value
  stat_compare_means(label = "p.signif", method = "t.test",
                     ref.group = ".all.")                      # Pairwise comparison against all

Good to have multiple options in case something else doesn't work later. Thanks a lot. — Paul, Apr 13 '21 at 11:03

Significance lines in box plot

2 Answers2

Linked