1

I have a dataset that looks like this, but with a few dozen more dependant variables.

set.seed(108)
test <- data.frame(
    n = 1:12,
    treatment = factor(paste("trt", 1:2)),
    type = sample(LETTERS, 3),
    var1 = sample(1:100, 12),
    var2 = sample(1:100, 12),
    var3 = sample(1:100, 12),
    var4 = sample(1:100, 12))

I would like to run a two-way ANOVA (effect of treatment and type on each of the dependant variables), and I am trying to do it automatically. Eventually, I'd like to plot barplots of each of the dependant variables including a compact letter display of the significance letters on each of the barplots. The letters would result from the ANOVA and pairwise comparison test, Example: https://statdoe.com/barplot-for-two-factors-in-r/ , section: Adding the compact letter display).

Could somebody give me hints to run this automatically? Or should I just give up and do it manually?

Andrea M
  • 2,314
  • 1
  • 9
  • 27
bobo
  • 23
  • 1
  • 5

1 Answers1

0

Does this work for you? If so, please also read my notes below.

# packages and function conflicts
library(broom)
library(conflicted)
library(emmeans)
library(multcomp)
library(multcompView)
library(tidyverse)

conflict_prefer("select", winner = "dplyr")
#> [conflicted] Will prefer dplyr::select over any other package

# data
set.seed(108)
test <- data.frame(
  n = 1:12,
  treatment = factor(paste("trt", 1:2)),
  type = sample(LETTERS, 3),
  var1 = sample(1:100, 12),
  var2 = sample(1:100, 12),
  var3 = sample(1:100, 12),
  var4 = sample(1:100, 12))

# Loop setup
var_names <- test %>% select(contains("var")) %>% names()
emm_list <- list()
anova_list <- list()

# Loop
for (var_i in var_names) {
  test_i <- test %>%
    rename(y_i = !!var_i) %>%
    select(treatment, type, y_i)
  
  mod_i <- lm(y_i ~ treatment*type, data = test_i)
  
  anova_list[[var_i]] <- mod_i %>% anova %>% tidy()
  
  emm_i <- emmeans(mod_i, ~treatment:type) %>%
    cld(Letters = letters)
  
  emm_list[[var_i]] <- as_tibble(emm_i)
}

# Combine emmean results
emm_out <- bind_rows(emm_list, .id = "id")

# Plot results
ggplot() +
  facet_wrap(~ id) +
  theme_classic() +
  # mean
  geom_point(
    data = emm_out,
    aes(y = emmean, x = type, color = treatment),
    shape = 16,
    alpha = 0.5,
    position = position_dodge(width = 0.2)
  ) +
  # red mean errorbar
  geom_errorbar(
    data = emm_out,
    aes(ymin = lower.CL, ymax = upper.CL, x = type, color = treatment),
    width = 0.05,
    position = position_dodge(width = 0.2)
  ) +
  # red letters
  geom_text(
    data = emm_out,
    aes(
      y = emmean,
      x = type, 
      color = treatment,
      label = str_trim(.group)
    ),
    position = position_nudge(x = 0.1),
    hjust = 0,
    show.legend = FALSE
  )

Created on 2022-09-01 with reprex v2.0.2

  • First of all, this question is not a duplicate of, but similar to the question "loop Tukey post hoc letters extraction".
  • Then note that you are talking about two separate things: ANOVA results and pairwise comparison test results. The former is stored in anova_list and the second in emm_list/emm_out. Only the second can get you the compact letter display and I did that here via cld(). Pleas see this chapter on compact letter displays for more details. Note that in that chapter I also provide code for bar charts instead of the one above, but also explain why bar charts may not be the best choice.
  • The topic of compact letter displays is a bit more complex when you have two factors and especially interactions between them. I was assuming that you always want to look at the interaction term, but maybe check out this answer for details on what your options are.
Paul Schmidt
  • 1,072
  • 10
  • 23