I'm trying to loop through a large dataframe [5413 columns] and run an ANOVA on each column, however I'm getting an error when trying to do so.
I'd like to have the P value from the ANOVA written to a new row in a dataframe containing the column titles. But limited my current knowledge I'm writing the P-value outputs to files I can parse through in bash.
Here's an example layout of the data:
data()
Name, Group, aaaA, aaaE, bbbR, cccD
Apple, Fruit, 1.23, 0.45, 0.3, 1.1
Banana, Fruit, 0.54, 0.12, 2.0, 1.32
Carrot, Vegetable, 0.01, 0.05, 0.45, 0.9
Pear, Fruit, 0.1, 0.2, 0.1, 0.3
Fox, Animal, 1.0, 0.9, 1.2, 0.8
Dog, Animal, 1.2, 1.1, 0.8, 0.7
And here is the output from dput:
structure(list(Name = structure(c(1L, 2L, 3L, 6L, 5L, 4L), .Label = c("Apple",
"Banana", "Carrot", "Dog", "Fox", "Pear"), class = "factor"),
Group = structure(c(2L, 2L, 3L, 2L, 1L, 1L), .Label = c(" Animal",
" Fruit", " Vegetable"), class = "factor"), aaaA = c(1.23,
0.54, 0.01, 0.1, 1, 1.2), aaaE = c(0.45, 0.12, 0.05, 0.2,
0.9, 1.1), bbbR = c(0.3, 2, 0.45, 0.1, 1.2, 0.8), cccD = c(1.1,
1.32, 0.9, 0.3, 0.8, 0.7)), class = "data.frame", row.names = c(NA,
-6L))
To get a successful output from one I do:
summary(aov(aaaA ~ Group, data=data))[[1]][["Pr(>F)"]]
I then try to implement that in a loop:
for(i in names(data[3:6])){
out <- summary(aov(i ~ Group, data=data))[[1]][["Pr(>F)"]]
write.csv(out, i)}
Which returns the error:
Error in model.frame.default(formula = i ~ Group, data = test, drop.unused.levels = TRUE) :
variable lengths differ (found for 'Group')
Can anyone help with getting around the error or implementing a per-column ANOVA?