I'm struggling to come up with a working solution to what seems like a fairly simple problem. I have a data frame with both data and factors in it, and I'd like to use the factors to decide which data points need to be subtracted from other data points to produce a new data frame of comparative values.
Here's what the data frame is like:
str(means)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 32 obs. of 5 variables:
$ rat : Factor w/ 8 levels "Rat1","Rat2",..: 1 1 1 1 2 2 2 2 3 3 ...
$ gene : Factor w/ 4 levels "gene1","gene2",..: 1 2 3 4 1 2 3 4 1 2 ...
$ gene_category: Factor w/ 2 levels "control","experimental": 2 2 1 1 2 2 1 1 2 2 ...
$ timepoint1 : num 23.4 18.3 42.1 40.1 25.3 ...
$ timepoint2 : num 23.5 18.4 41.5 39.9 22.8 ...
> head(means)
Source: local data frame [6 x 5]
Groups: rat, gene [6]
rat gene gene_category timepoint1 timepoint2
(fctr) (fctr) (fctr) (dbl) (dbl)
1 Rat1 gene1 experimental 23.36667 23.49667
2 Rat1 gene2 experimental 18.26000 18.38000
3 Rat1 gene3 control 42.05500 41.45000
4 Rat1 gene4 control 40.08667 39.89500
5 Rat2 gene1 experimental 25.29333 22.83000
6 Rat2 gene2 experimental 19.72667 19.19333
For each rat (8 rats in total), I'd like to subtract the 'control' gene values (genes 3 and 4) from the 'experimental' gene values (genes 1 and 2). I need to do this iteratively, so each experimental gene value must have each control gene value subtracted from it (within each rat, but not between rats). The above should be done for each timepoint column.
I've been fiddling with a solution using dplyr, I've got the grouping down but I can't figure out how to do the rest:
diffs <- means %>% group_by(rat, gene, gene_category) %>% here_is_where_i_don't_know_what_to_do)
There is a solution here to a similar problem here but I think it will give me every pairwise operation possible, and that's not what I'm looking for. It also only deals with two factors, while I have three I need to consider.
Here's another solution to a similar problem, but again there are some things about it that make it less than ideal. It deals with one factor only and I'm not sure how it could be applied to a dataset with three factors and two data vectors.
I know that this problem is solved when doing something like a pairwise comparison to determine statistical significance (multiple t-tests, ANOVA, MANOVA, etc), but the packages/base stat functions I'm familiar with that do these tests keep this basic operation under the hood. I'd like a simple solution that uses as few loops as possible with either base R or dplyr/plyr/reshape2, etc.