I did a quantitative proteomics experiment to measure the differential expression of proteins in cells between two conditions. The output is a list of peptides, the protein they map to, and the their abundance for the experimental and control condition. Each protein has several detected peptides, and I need to pull out the median peptide abundance per protein, per condition into a new data frame. A simple version is as follows below:
gene | peptide | condition 1 abundance | condition 2 abundance |
---|---|---|---|
protein 1 | A | 1 | 4 |
protein 1 | B | 2 | 5 |
protein 2 | A | 3 | 6 |
protein 2 | B | 3.5 | 7 |
protein 2 | C | 5 |
Is there a way to write code for this in R? Note that I have about 6000 proteins, and about 60,000 detected peptides. Not all peptides were detected in both condition 1 and 2, but I would still need to take the median of all peptides per protein for each condition separately.
The goal is to do statistical analysis between the median peptide abundance for each protein so I can see if the values are significantly different.
Thanks in advance!