In a data frame of features (eg proteins or mRNA transcripts), I want to calculate a one-way ANOVA for every column (representing expression values). Each row represents a sample (id) that belongs to a group (5 groups in example data).
Searching the web and Stackoverflow only gave me non-functional and partial answers, the expected result would be a table with the ANOVA p-value for every feature (eg a df with first col = feature, second col = ANOVA p-value). Below the example data I post a non-working solution.
For pairwise differential expression analysis between the groups, I am using LIMMA which delivers multiple-comparison corrected p-values. If I understand correctly, ANOVA does not care for mul.comp.cor but only shows for which features there is a difference between any of the 5 groups (and a post-test, eg LIMMA, shows me in which of the groups there is the difference). It would be great if you could confirm that this way of assessing differential expression between my 5 groups is correct.
The solution was taken from here: One-way ANOVA for loop: how do I iterate through multiple columns of a dataframe?
example data
# gene df and list
genes <- paste("gene",1:1000,sep="")
x <- list(
A = sample(genes,300),
B = sample(genes,525),
C = sample(genes,440),
D = sample(genes,350)
)
# expression dataframe
crete_exp_df <- function(gene_nr, sample_nr){
df <- replicate(sample_nr, rnorm(gene_nr))
rownames(df) <- paste("Gene", c(1:nrow(df)))
colnames(df) <- paste("Sample", c(1:ncol(df)))
return(df)
}
df1 <- crete_exp_df(50, 20)
df1 <- as.data.frame(df1)
df1$fid <- rownames(df1)
# creator for ANOVA
df4ANOVA <- df1 %>% pivot_longer(-fid) %>%
pivot_wider(names_from="fid", values_from="value") %>%
rename(id=name)
df4ANOVA$group <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5)
solution (not working)
library(tidyverse)
library(broom)
df4ANOVA$group <- as.factor(df4ANOVA$group )
df4ANOVA$id <- NULL
# ANOVA
baseformula <- " ~ group"
ANOVA_group <- for (i in 2:ncol(exp4anov)) {
formula <- paste(colnames(exp4anov)[i], baseformula, sep="")
p <- summary(aov(as.formula(formula), data=exp4anov))[[1]][["Pr(>F)"]][1]
print(paste(formula, ": p=", p, sep=""))
}