I have a dataset of genes in groups called 'loci' I am looking to select the gene with the higest score compared against only the genes in the same loci/group
My input data looks like this:
loci Gene Score
1: 1 AQP11 0.5566507
2: 1 CLNS1A 0.2811747
3: 1 RSF1 0.5269924
4: 2 CFDP1 0.4186066
5: 2 CHST6 0.5395135
My output would select the gene for group/loci 1 that has the highest score out of the 3 genes in loci 1, then also the gene with the highest score when compared with only the other gene in group 2.
So the output from this example I'm trying to get is:
loci Gene Score
1: 1 AQP11 0.5566507 #highest score in loci 1
2: 2 CHST6 0.5395135 #highest score in loci 2
How can I filter for highest score by row groupings? I'm not sure where to start with this.
Input data:
structure(list(loci = c(1L, 1L, 1L, 2L, 2L), Gene = c("AQP11",
"CLNS1A", "RSF1", "CFDP1", "CHST6"), Score = c(0.556650698184967,
0.281174659729004, 0.526992380619049, 0.418606609106064, 0.539513528347015
)), row.names = c(NA, -5L), class = c("data.table", "data.frame"
))
I've been trying something with dplyr with dplyr::group_by()
but I keep getting various errors.