2

I'm working on a dataframe (called df) looking something like this (shortened here for practical reasons):

Observed Shannon InvSimpson Evenness Month
688 4.553810 23.365814 0.6969632 February
749 4.381557 15.162467 0.6619927 February
610 3.829187 11.178981 0.5970548 February
665 4.201113 16.284009 0.6463463 March
839 5.185554 43.198709 0.7702601 March
757 4.782258 31.011366 0.7213751 March
516 3.239304 4.765211 0.5186118 April
... ... ... ... ...

I am running a post-hoc test using Dunn's test, then adding the xy positions, and plotting everything as boxplots. It works but my code is very repetitive...

library(rstatix)

obs_dunn <- dunn_test(Observed ~ Month, data=df, p.adjust.method="BH")
obs_dunn <- obs_dunn %>% arrange(p.adj)
obs_dunn <- obs_dunn %>% add_xy_position(x = "Month")
obs_bxp <- ggboxplot(df, x = "Month", y = "Observed" ) + 
  stat_pvalue_manual(obs_dunn, label = "p.adj.signif", hide.ns = TRUE)
obs_bxp

sh_dunn <- dunn_test(Shannon ~ Month, data=df, p.adjust.method="BH")
sh_dunn <- sh_dunn %>% arrange(p.adj)
sh_dunn <- sh_dunn %>% add_xy_position(x = "Month")
sh_bxp <- ggboxplot(df, x = "Month", y = "Shannon" ) + 
  stat_pvalue_manual(sh_dunn, label = "p.adj.signif", hide.ns = TRUE)
sh_bxp

inv_dunn <- dunn_test(InvSimpson ~ Month, data=df, p.adjust.method="BH")
inv_dunn <- inv_dunn %>% arrange(p.adj)
inv_dunn <- inv_dunn %>% add_xy_position(x = "Month")
inv_bxp <- ggboxplot(df, x = "Month", y = "InvSimpson" ) + 
  stat_pvalue_manual(inv_dunn, label = "p.adj.signif", hide.ns = TRUE)
inv_bxp

ev_dunn <- dunn_test(Evenness ~ Month, data=df, p.adjust.method="BH")
ev_dunn <- ev_dunn %>% arrange(p.adj)
ev_dunn <- ev_dunn %>% add_xy_position(x = "Month")
ev_bxp <- ggboxplot(df, x = "Month", y = "Evenness" ) + 
  stat_pvalue_manual(ev_dunn, label = "p.adj.signif", hide.ns = TRUE)
ev_bxp

I get basic boxplots with the significance stars added in, here's the result for the "Observed" one for example:

enter image description here

It's the same lines of code for each indices (Observed, Shannon, InvSimpson, Evenness), so I would like to make a for loop, but I'm quite new at that and I'm really struggling.

Do you have any idea how I could run a loop for the dunn_test(), add_xy_position() and ggboxplot() on my 4 indices? Preferably with a separate dataframe as output for each indices.

Even just a loop for the first step dunn_test() would already be so much help, because I don't know where to start...

Thanks in advance for any advice you would have. :)

jay.sf
  • 60,139
  • 8
  • 53
  • 110

3 Answers3

2

Just use formula and data as arguments in a function as well as ... for additional arguments such as bracket.nudge.y= to adjust for the silly n.s whitespace.

dunn_bxp <- \(fo, data, plot=TRUE, ...) {
  vars <- all.vars(fo)
  dnn <- rstatix::dunn_test(fo, data=data, p.adjust.method="BH")
  dnn_p <- rstatix::add_xy_position(dnn, x=vars[2])
  if (plot) {
    p <- ggpubr::ggboxplot(data, x=vars[2], y=vars[1]) + 
      ggpubr::stat_pvalue_manual(dnn_p, label="p.adj.signif", hide.ns=TRUE, ...)
    print(p)
    return(invisible(as.data.frame(dnn[1:8])))
  } else {
    return(as.data.frame(dnn[1:8]))
  }
}

If plot=TRUE (the default), it plots and invisibly returns the statistics.

r <- dunn_bxp(fo=observed ~ month, data=df, bracket.nudge.y=-1000)
head(r)
#        .y. group1 group2 n1 n2  statistic           p      p.adj
# 1 observed    Apr    Aug  3  3 -0.6199874 0.535266078 0.71803318
# 2 observed    Apr    Dec  3  3 -2.7124449 0.006678888 0.08816133
# 3 observed    Apr    Feb  3  3 -1.1237272 0.261128784 0.50954093
# 4 observed    Apr    Jan  3  3 -1.7049654 0.088200884 0.32174752
# 5 observed    Apr    Jul  3  3 -1.7049654 0.088200884 0.32174752
# 6 observed    Apr    Jun  3  3  0.7749843 0.438348961 0.64291181

If plot=FALSE it just returns the statistics.

head(dunn_bxp(fo=observed ~ month, data=df, plot=FALSE))
#        .y. group1 group2 n1 n2  statistic           p      p.adj
# 1 observed    Apr    Aug  3  3 -0.6199874 0.535266078 0.71803318
# 2 observed    Apr    Dec  3  3 -2.7124449 0.006678888 0.08816133
# 3 observed    Apr    Feb  3  3 -1.1237272 0.261128784 0.50954093
# 4 observed    Apr    Jan  3  3 -1.7049654 0.088200884 0.32174752
# 5 observed    Apr    Jul  3  3 -1.7049654 0.088200884 0.32174752
# 6 observed    Apr    Jun  3  3  0.7749843 0.438348961 0.64291181

enter image description here


Data:

set.seed(557)
df <- data.frame(observed=round(rep(runif(12, 500, 800), each=3) + rep.int(rnorm(12, 0, 64), 3)),
                 month=rep(month.abb, each=3))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 2
    Wow thanks, it's a really neat code, and I'll be able to easily apply it to other datasets! To get rid of the ```ns``` I used the following line of code in between the ```dunn_test``` and the ```add_xy_position``` : ```dnn <- dplyr::arrange(.data = dnn, p.adj)```. That way I shouldn't have to change the ```bracket.nudge.y``` with each new dataset :) – sophiethomas Apr 12 '23 at 18:44
1

Here is a function that takes a character string as an argument. You can call this function for each of the variables.

dunn_boxplot <- function(yvariable) {
  dunn <- dunn_test(formula(paste(yvariable, "~ Month")), data=df, p.adjust.method="BH") %>%
    arrange(p.adj) %>%
    add_xy_position(x = "Month")
  bxp <- ggboxplot(df, x = "Month", y = yvariable) + 
    stat_pvalue_manual(dunn, label = "p.adj.signif", hide.ns = TRUE)
}

Call the function as follows. Repeat for your other three variables.

obs_bxp <- dunn_boxplot("Observed")

The key thing to notice is that your four blocks of code only differ in the y variable name. So it is passed as an argument. You need to use formula(paste(...) to assemble your formula as a character string and then coerce it to the formula class.

qdread
  • 3,389
  • 19
  • 36
  • Thank you for your answer, however I get the following error when I run it: Error in `select()`: ! Can't subset columns that don't exist. ✖ Column `formula(paste(yvariable, "~ Month"))` doesn't exist. Backtrace: 1. global dunn_boxplot("Observed") 26. dplyr:::select.data.frame(., !!!syms(c(outcome, group))) – sophiethomas Apr 12 '23 at 01:33
  • I made a mistake in the code. It should work now. Sorry about that. Other answers are good too. – qdread Apr 12 '23 at 09:30
  • Thanks a lot @qdread I still got an error with `ggboxplot(df, x = "Month", y = y_variable)` : `Error in ggboxplot(df, x = "Month", y = y_variable) : object 'y_variable' not found` So I used `ggplot`'s `geom_boxplot` instead. Like so: `bxp <- ggplot2::ggplot(df) + geom_boxplot(aes(x = .data[["Month"]], y = .data[[yvariable]]))` – sophiethomas Apr 12 '23 at 18:52
  • Aha, I found another typo in my code. I accidentally wrote `y_variable` in one place and `yvariable` in another. I was not able to test the code because you did not provide a reproducible example, which would have made it easier to check if there were any mistakes. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 – qdread Apr 12 '23 at 23:54
1

qdread showed a super smooth approach. I have a different approach, using a for loop. Since you did not post a reproducible example I could not test my code and it may produce errors, but it should give you an idea. Best of luck!

list <- list() # creates empty list
count<-1       # creates an indicator starting with one, increasing by one every iteration, first plot will be saved as first item of list, second plot as second item, ...

for (i in c("Observed","Shannon","InvSimpson","Evenness")){            # i will take the first value in c(...) in the first iteration
                                                                       # the second value in the second iteration, ...
  list[[count]] <- eval(parse(
    text=paste0('dunn_test(',i,'~ Month, data=df, p.adjust.method="BH")') #paste0 pasts a string, eval(parse(text="string")) executes the string
    ))
  list[[count]] <- list[[count]] %>% arrange(p.adj)
  list[[count]] <- list[[count]] %>% add_xy_position(x = "Month")
  list[[count]] <- eval(parse(text=paste0('ggboxplot(df, x =,',i,', y = paste0(i)) + 
                                stat_pvalue_manual(',list[[count]],', label = "p.adj.signif", hide.ns = TRUE)')))
  count<- count+1 # count indicator increases by 1
}
ajj
  • 55
  • 6