I have a vector list of data frames I generated from a normal data frame as follows. Gene expression contains columns identifying each observation's cell line, gene, chromosomal region, and expression level -- multiple genes may belong to a specific chromosomal region and I want to compare expression in one chromosomal region against all others. It looks something like this:
gene region cell_line expression
A X Joe 1
B X Joe 2
C Y Joe 2
D Z Joe 3
E Z Joe 0
A X Claire 2
B X Claire 1
C Y Claire 3
D Z Claire 3
E Z Claire 1
I split it based on cell line.
gene_expression_groups <- gene_expression %>%
group_by(cell_line) %>%
group_split()
Next, I try to get the mean expression of all chromosomal regions other than that in the one in question for all regions via
for (i in seq_along(gene_expression_groups)){
gene_expression_groups[[i]] <- as_tibble(gene_expression_groups[[i]]) %>%
group_by(chromosomal_region) %>%
group_modify(~anti_join(gene_expression_groups[[i]], .) %>%
summarize(muG = mean(expression),
sigma2G = var(expression),
NG = n()))
}
Ideally the end product would be a new vector list, where each split group now looks like so.
region cell_line mean_other standard_deviation_other
X Joe 1.67 some number
Y Joe 1.5 some number
Z Joe 1.67 some number
Unfortunately, this code only results in "Error: can't convert from <grouped_df< to <tbl_df< due to loss of precision" and I'm lost as to what to do. Any help would be appreciated!