0

I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient

Example data:

DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L, 
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L, 
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n", 
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle", 
"liver", "liver", "liver", "intestine", "intestine", "intestine", 
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9, 
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013, 
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067), 
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812, 
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331, 
0.185452088760136, 0.247467063170448, 0.279298057669285, 
0.328359182374352, 0.261824790465914)), .Names = c("Sample", 
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")

Coming across similar examples: Anova, for loop to apply function and ANOVA on multiple responses, by multiple groups NOT part of formula

I can get close but I do not believe this is correct as it uses aov, as opposed to anova

x<- unique(DF$Tissue)

sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)

If i switch aov for anova, it returns an error message:

 Error in UseMethod("anova") : 
 no applicable method for 'anova' applied to an object of class "formula" 

Long way around but which is CORRECT is as follows:

#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))

However In the main data frame I have many tissue types and want to avoid performing this subset.

I believe the apply formula is close but need help on the final stages.

Community
  • 1
  • 1
Salmo salar
  • 517
  • 1
  • 5
  • 17
  • 1
    `sapply(x, function(my) { anova(lm(l.arc~Time*Ploidy, data=DF, Tissue==my)), simplify=FALSE)`. This uses a subset feature of `lm` – user20650 May 30 '14 at 21:57
  • 1
    `anova` generates an ANOVA table from a fitted linear model (it doesn't fit the model itself; that's what `lm` and `aov` do). You have to use `anova(lm(...))` or `anova(aov(...))` to get an ANOVA table, as you did in your standalone example. `aov` is a wrapper for `lm` which prints its results in a more "traditional" ANOVA form (see the help for aov: `?aov`. – andyteucher May 30 '14 at 22:11
  • Thanks for the pointers: few more tweeks and this works: sapply(x, function(my) { anova(lm(l.arc~Ploidy*Time, data=DF, Tissue==my)) }, simplify=FALSE) – Salmo salar May 30 '14 at 22:23

1 Answers1

4

Building on @user20650 and my comments above, I would suggest first using sapply with lm to generate your list of models, and then use sapply again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.

x <- unique(DF$Tissue)

models <- sapply(x, function(my) {
  lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)

ANOVA.tables <- sapply(models, anova, simplify=FALSE)
andyteucher
  • 1,393
  • 14
  • 21