From two dataframes with single expression values (rows) per sample (cols) of different groups, I want to calculate the mean and median per group. My solution seems a bit verbose and I wonder if there is a more elegant solution.
Data
# expression values
genes <- paste("gene",1:1000,sep="")
x <- list(
A = sample(genes,300),
B = sample(genes,525),
C = sample(genes,440),
D = sample(genes,350)
)
# expression dataframe
crete_exp_df <- function(gene_nr, sample_nr){
df <- replicate(sample_nr, rnorm(gene_nr))
rownames(df) <- paste("Gene", c(1:nrow(df)))
colnames(df) <- paste("Sample", c(1:ncol(df)))
return(df)
}
exp1 <- crete_exp_df(50, 20)
exp2 <- crete_exp_df(50, 20)
# sample annotation
san <- data.frame(
id = colnames(exp1),
group = sample(1:4, 20, replace = TRUE))
Solution
# get ids of samples per group
ids_1 <- san %>% filter(group == 1) %>% pull(id)
ids_2 <- san %>% filter(group == 2) %>% pull(id)
ids_3 <- san %>% filter(group == 3) %>% pull(id)
ids_4 <- san %>% filter(group == 4) %>% pull(id)
id_list <- list(group1 = ids_1, group2 = ids_2, group3 = ids_3, group4 = ids_4)
# fct means df1
get_means_exp1 <- function(id){
apply(exp1[, id], 1, mean, na.rm = T)
}
# fct means df2
get_means_exp2 <- function(id){
apply(exp2[, id], 1, mean, na.rm = T)
}
# lapply on df1
list_means_exp1 <- lapply(id_list, get_means_exp1)
means_exp1 <- as.data.frame(list_means_exp1)
# lapply on df2
list_means_exp2 <- lapply(id_list, get_means_exp2)
means_exp2 <- as.data.frame(list_means_exp2)
I suppose this can be solved much more elegant. Specifically, how to get the ids per group and write a function that works for both df. Looking forwards to learning from your solutions, thanks a lot!