Essentially, I am trying to make a series of plots with log2 fold-change on the y-axis and mean counts on the y-axis (the observation is genes). These are commonly called MA plots. The issue I am having is getting my data into the right form. I can do this through a loop, but would like to know the right way to do it.
At this point, I have two data frames: my design matrix and my data matrix. The design matrix looks like so (call it ED_df):
SampleID Patient Grade Batch
MD48L_2_B_L1 MD48 G2 Feb15
MD48R_3_B_L1 MD48 G3 Feb15
MD53L_2_B_L1 MD53 G2 Feb15
MD53R_3_B_L1 MD53 G3 Feb15
MD58L_2_B_L1 MD58 G2 Sep15
MD58R_3_B_L1 MD58 G3 Sep15
dim(ED_df)
# [1] 18 6
Each row is a unique sample. Each sample comes from patient+grade+batch. In this case, all patients are paired around grade (G2 or G3). There are 3 total batches. Two patients were replicated across either batch 1 and 2 or batch 2 and 3.
My data matrix looks like so (call it data_df):
Gene MD48L_2_B_L1 MD48R_3_B_L1 MD53L_2_B_L1 MD53R_3_B_L1 MD58L_2_B_L1
1 ENSG00000000003 364.26079 329.28730 531.52188 371.67413 275.745038
2 ENSG00000000005 18.92264 49.89201 42.18428 19.42548 1.948728
3 ENSG00000000419 270.59373 261.65590 284.74386 414.41018 293.283591
4 ENSG00000000457 145.70432 125.28439 122.33440 129.50318 148.103342
dim(data_df)
# [1] 31707 18
Each column corresponds to a unique sample.
What I am wanting to do is to get, for each gene, a log2 fold-change (G3/G2) within each patient-batch set. Additionally, I want to get mean (G3, G2) for each patient-batch set.
I will then plot this as an MA plot.
Again, I can see how to do this painfully through a nested for-loop, what I would like to do is figure out how to do this via some sort of aggregating function.