I have a dataset set which looks like this:
uniprot site netphorest
1 C9J0A7 169 0.064921
3 C9J0A7 169 0.063045
4 C9J0A7 169 0.055366
9 C9J0A7 169 0.055366
10 C9J0A7 169 0.055366
11 C9J0A7 169 0.055577
14 C9J0A7 169 0.054875
15 C9J0A7 169 0.054875
16 C9J0A7 169 0.054875
22 C9J0A7 169 0.430742
23 C9J0A7 169 0.430742
multiple entries for the same uniprot identifier and modification site, but each entry has multiple netphorest scores (the likelihood of it being modified by a particular enzyme) and over 42,000 observations. essentially i want to select the highest score for a particular uniprot/site row.
I have tried to do something like this (1hCX is my data frame)
CX1href <- subset.data.frame(CX1h, netphorest = max)
where I am trying to subset the the rows based on the largest variable in the netphorest column, however, my new data frame still contains the same number of entries as the original data frame. Not sure how to approach this issue as I have multiple entries with the same uniprot code and site number...
I tried this out and got this error:
CX1href <- aggregate.data.frame(netphorest = ~ uniprot + site, CX1h, FUN = mean, max)
Error in aggregate.data.frame(netphorest = ~uniprot + site, CX1h, FUN = mean, :
'by' must be a list