0

I am trying to add p-values to each boxplot pair in the graph shown below. I would like the p-values to be placed under each soil horizon label ('O', 'A' and 'B'). enter image description here

My data looks like this:

> head(kiwi_l)
# A tibble: 6 x 6
type         horizon    root_name  length diameter n_child
<chr>        <chr>      <chr>       <dbl>    <dbl>   <int>
1 Elevated CO2 A      R1_A_L_S4G 0.0752   0.0342       0
2 Elevated CO2 A      R1_A_L_S4F 0.0987   0.0319       0
3 Elevated CO2 A      R1_A_L_S4E 0.105    0.0209       0
4 Elevated CO2 A      R1_A_L_S4D 0.0476   0.0127       0
5 Elevated CO2 A      R1_A_L_S4C 0.110    0.0282       0
6 Elevated CO2 A      R1_A_L_S4B 0.244    0.0168       0

While the code I used to generate the graph is:

l_horizon<-ggplot(kiwi, aes(x=type, y=length, fill=type, palette='jco')) 
+ 
geom_boxplot() +
facet_grid(. ~ factor(horizon, level=level_order)) +
theme_pubr() +
scale_y_continuous(name='Primary root length (cm)') +
scale_x_discrete(name='Treatment') +
ggtitle('Soil horizon') + theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position="none") +
theme(plot.title = element_text(size = 10, face = "bold"),
    text = element_text(size = 10),
    axis.title = element_text(face="bold"),
    axis.text.x=element_text(size = 10),
    axis.text.y=element_text(size=10),
    axis.title.x = element_blank(),
    axis.title.y=element_text(size=10))

    l_horizon<-l_horizon+scale_fill_locuszoom()
    l_horizon

Please help!

  • 3
    Hard to know without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What packages are you using besides `ggplot2`? `ggpubr`? What is `mtext`? Also, p-values of what, and where do you intend to get them from? – camille May 16 '19 at 13:25

2 Answers2

1

Since there is no data to play around with, I'll make up some:

set.seed(0)
df <- data.frame(f1 = rep(c("O","A","B"), each = 30),
                 f2 = rep(c("M","N"), 45),
                 y = rnorm(90))

Next we do a test on that data and format it's output:

tests <- split(df, df$f1) %>% sapply(function(x){
  pval <- t.test(x[x$f2 == "M", "y"], x[x$f2 == "N", "y"])$p.value
  paste0("p-value = ", format(pval, digits = 2, nsmall = 2))
})

Now if you want it to be part of the facet strip, you can adjust the levels of df$f1 to include the p-value:

levels(df$f1) <- paste0(levels(df$f1), "\n", tests)

ggplot(df, aes(x = f2, y = y)) +
  geom_boxplot() +
  facet_grid(~ f1)

enter image description here

If you wanted the p-values inside the panel instead of in the strip, you can use the annotate() function to place them in the panel. y = Inf ensures they are placed at the top.

ggplot(df, aes(x = f2, y = y)) +
  geom_boxplot() +
  facet_grid(~ f1) +
  annotate("text", x = 1.5, y = Inf, label = tests, vjust = 1)

enter image description here

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • Hi thank you for your answer! When I try and use the code to get the p-value though I get the following problem 'Error during wrapup: Can't use matrix or array for column indexing'. Do u know what I'm doing wrong? THanks so much for your help! – polarsandwich May 16 '19 at 14:52
  • 1
    That's why a working sample of data is needed. We have no way of knowing what type of data you're using. Also @teunbrand may be right to assume you're getting p-values from a t-test within each A/B/O group, but again this is something that needs to be included in the question. P-values aren't just numbers that are inherent to the data—they come from statistical tests that you've made some decisions about – camille May 16 '19 at 15:05
  • I agree with @camille here: I have no ways to reproduce your error so I can't try and figure out a solution for your problem. And indeed, I used a t-test, which is appropriate for the distribution I sampled from, but if you don't know what test is appropriate for your data, you might be better off using the non-parametric `wilcox.test()`. – teunbrand May 16 '19 at 16:21
  • Sorry guys I am new to this forum :( how do I post my working sample? I am using Wilcox as data is not normally distributed – polarsandwich May 16 '19 at 16:45
  • Usually posting the output of `dput(head(your_data))` (for relevant objects in your question), is a good way. – teunbrand May 16 '19 at 17:09
  • Click the link in my first comment to the question. It goes to a post on how to make a reproducible example – camille May 16 '19 at 19:13
  • Hi all, I posted a snippet of my data, is this ok? Thanks a lot, I really appreciate your help – polarsandwich May 17 '19 at 13:08
  • Hi, I now managed to reproduce and solve the error. What solved it for me was calling `df <- as.data.frame(df)` before doing the testing loop, since that loop wasn't prepared to work with tibbles but with base R data.frames. As a tip for next questions, `dput(head(kiwi_l))` would in this case have been somewhat more convenient to people trying to help you, because it lets them simply copy-paste the data and get the data structure as you have it in your session. – teunbrand May 17 '19 at 19:24
0

If you know where the on the y-axes to put the text, maybe annotate like this?

p_values <- c(1.1,2.2,3.3)

ggplot(data = d2,mapping = aes(x=range,y=p_area)) +
geom_boxplot() +
annotate("text", x=c(1,2,3), y=0.5, label= p_values)

plot

  • What's d2 and why would p-values be that large? They're not just arbitrary numbers – camille May 16 '19 at 15:01
  • @camille I guess it is just an example of "how to add text to boxplots", could have been any value. Agree, pvalue is not coming from the data. – zx8754 May 16 '19 at 15:05