2

I know so far how to run a ANOVA on R, but I allways have to duplicate the code to run de ANOVA for another variable, I was wondering if I could pass it somehow to the aov() in a loop the names of the variables and store the result of the ANOVA in variables so I don't have to manually change them by copying the code block.

E.G.:

Variables I want to test: Z, Y, X

Categorical Variable: Treatment

VectorVariables = c(Z, Y, X)

for (i in Vector) {
   AnovaZ <- aov(Z ~ Treatment) #then
   AnovaY <- aov(Y ~ Treatment) # and so on..

}

It is possible in some way??

Paulo Barros
  • 157
  • 1
  • 2
  • 12
  • 1
    You should probably read this: [Fast post hoc computation using R](https://stackoverflow.com/q/51937380/4891738) – Zheyuan Li Sep 01 '18 at 13:16
  • @李哲源 I vote to re-open because (in my opinion) OP's question is not (directly) about post-hoc tests. At this point in time OP is asking about how to perform ANOVAs for different response variables in an efficient manner. Post-hoc tests address the issue with increased type I errors due to multiple testing. This may be related, but is not what OP is asking. – Maurits Evers Sep 01 '18 at 13:47
  • @李哲源 OP doesn't mention any post-hoc tests (such as e.g. `TukeyHSD`), so I'm not sure what he's planning to do after the ANOVAs. I don't see any "maov" references either. Perhaps OP should clarify. I was under the impression that he simply wants to perform two independent ANOVAs. A post-hoc test is only relevant for controlling the overall false-positive error rate in a single study involving multiple tests. – Maurits Evers Sep 01 '18 at 14:03

3 Answers3

2

There is no need for a for loop! You can simply cbind different response variables together.

Here is an example:

  1. Since you don't provide a sample dataset, I generate some sample data based on the npk dataset, where I add a second response variable yield2 which is the same as yield with some added white noise.

    set.seed(2018)
    df <- npk
    df$yield2 <- df$yield + rnorm(nrow(df), mean = 0, sd = 5)
    
  2. Perform ANOVAs based on the two response variables yield and yield2

    res <- aov(cbind(yield, yield2) ~ block, df)
    #Call:
    #   aov(formula = cbind(yield, yield2) ~ block, data = df)
    #
    #Terms:
    #                   block Residuals
    #resp 1           343.295   533.070
    #resp 2          905.0327  847.2597
    #Deg. of Freedom        5        18
    #
    #Residual standard errors: 5.441967 6.860757
    #Estimated effects may be unbalanced
    

    resp 1 and resp 2 give the sum of squares that you get if you had run aov(yield ~ block, df) and aov(yield2 ~ block, df) individually.

So in your case, the command would be something like

res <- aov(cbind(Y, Z) ~ Treatment)

Or if you want to run and store results from separate ANOVAs, store the response variables in a list and use lapply:

lapply(list(Y = "Y", Z = "Z"), function(x)
    aov(as.formula(sprintf("%s ~ Treatment", x)), df))

This produces a list of ANOVA results, where every list element corresponds to a response variable.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

If you want to do a loop, the trick is to use as.formula(paste()).

Create a list (we'll call it result) to store each aov output. Then loop through dependent variable names stored in Vector:

n <- length(Vector)
result <- vector(mode="list", length=n)
for(i in 1:n) {
    result[[i]] <- aov(as.formula(paste(Vector[i], "~ Treament")))
}
DanY
  • 5,920
  • 1
  • 13
  • 33
1

Another solution is to use list columns and purrr::map. This can be useful when working with many models (e.g. see http://r4ds.had.co.nz/many-models.html).

library(tidyverse)

aov_f <- function(df) {aov(value ~ carb, data = df)}

mtcars_n <- gather(mtcars, obs, value, mpg:gear) %>%
  group_by(obs) %>%
  nest() %>%
  mutate(aov = map(data, aov_f))
Matt Nolan
  • 610
  • 6
  • 22