ANOVA on R with different dependent variables

Question

I know so far how to run a ANOVA on R, but I allways have to duplicate the code to run de ANOVA for another variable, I was wondering if I could pass it somehow to the aov() in a loop the names of the variables and store the result of the ANOVA in variables so I don't have to manually change them by copying the code block.

E.G.:

Variables I want to test: Z, Y, X

Categorical Variable: Treatment

VectorVariables = c(Z, Y, X)

for (i in Vector) {
   AnovaZ <- aov(Z ~ Treatment) #then
   AnovaY <- aov(Y ~ Treatment) # and so on..

}

It is possible in some way??

You should probably read this: [Fast post hoc computation using R](https://stackoverflow.com/q/51937380/4891738) — Zheyuan Li, Sep 01 '18 at 13:16
@李哲源 I vote to re-open because (in my opinion) OP's question is not (directly) about post-hoc tests. At this point in time OP is asking about how to perform ANOVAs for different response variables in an efficient manner. Post-hoc tests address the issue with increased type I errors due to multiple testing. This may be related, but is not what OP is asking. — Maurits Evers, Sep 01 '18 at 13:47
@李哲源 OP doesn't mention any post-hoc tests (such as e.g. `TukeyHSD`), so I'm not sure what he's planning to do after the ANOVAs. I don't see any "maov" references either. Perhaps OP should clarify. I was under the impression that he simply wants to perform two independent ANOVAs. A post-hoc test is only relevant for controlling the overall false-positive error rate in a single study involving multiple tests. — Maurits Evers, Sep 01 '18 at 14:03

Maurits Evers · Accepted Answer · 2018-09-01T06:38:56.833

There is no need for a for loop! You can simply cbind different response variables together.

Here is an example:

Since you don't provide a sample dataset, I generate some sample data based on the npk dataset, where I add a second response variable yield2 which is the same as yield with some added white noise.
```
set.seed(2018)
df <- npk
df$yield2 <- df$yield + rnorm(nrow(df), mean = 0, sd = 5)
```

Perform ANOVAs based on the two response variables yield and yield2

res <- aov(cbind(yield, yield2) ~ block, df)
#Call:
#   aov(formula = cbind(yield, yield2) ~ block, data = df)
#
#Terms:
#                   block Residuals
#resp 1           343.295   533.070
#resp 2          905.0327  847.2597
#Deg. of Freedom        5        18
#
#Residual standard errors: 5.441967 6.860757
#Estimated effects may be unbalanced

resp 1 and resp 2 give the sum of squares that you get if you had run aov(yield ~ block, df) and aov(yield2 ~ block, df) individually.

So in your case, the command would be something like

res <- aov(cbind(Y, Z) ~ Treatment)

Or if you want to run and store results from separate ANOVAs, store the response variables in a list and use lapply:

lapply(list(Y = "Y", Z = "Z"), function(x)
    aov(as.formula(sprintf("%s ~ Treatment", x)), df))

This produces a list of ANOVA results, where every list element corresponds to a response variable.

DanY · Answer 2 · 2018-09-02T01:32:44.140

1

If you want to do a loop, the trick is to use as.formula(paste()).

Create a list (we'll call it result) to store each aov output. Then loop through dependent variable names stored in Vector:

n <- length(Vector)
result <- vector(mode="list", length=n)
for(i in 1:n) {
    result[[i]] <- aov(as.formula(paste(Vector[i], "~ Treament")))
}

edited Sep 02 '18 at 01:32

answered Sep 01 '18 at 03:10

DanY

5,920
1
13
33

score 1 · Answer 3 · answered Sep 02 '18 at 05:27

1

Another solution is to use list columns and purrr::map. This can be useful when working with many models (e.g. see http://r4ds.had.co.nz/many-models.html).

library(tidyverse)

aov_f <- function(df) {aov(value ~ carb, data = df)}

mtcars_n <- gather(mtcars, obs, value, mpg:gear) %>%
  group_by(obs) %>%
  nest() %>%
  mutate(aov = map(data, aov_f))

answered Sep 02 '18 at 05:27

Matt Nolan

610
6
22

Thank you Matt, I will look at it! – Paulo Barros Nov 17 '18 at 22:11

ANOVA on R with different dependent variables

3 Answers3