-1

I have a dataset of genes in groups called 'loci' I am looking to select the gene with the higest score compared against only the genes in the same loci/group

My input data looks like this:

    loci Gene     Score
1:    1  AQP11   0.5566507
2:    1 CLNS1A   0.2811747
3:    1   RSF1   0.5269924
4:    2  CFDP1   0.4186066
5:    2  CHST6   0.5395135

My output would select the gene for group/loci 1 that has the highest score out of the 3 genes in loci 1, then also the gene with the highest score when compared with only the other gene in group 2.

So the output from this example I'm trying to get is:

     loci  Gene     Score
1:    1    AQP11   0.5566507 #highest score in loci 1
2:    2    CHST6   0.5395135 #highest score in loci 2

How can I filter for highest score by row groupings? I'm not sure where to start with this.

Input data:

structure(list(loci = c(1L, 1L, 1L, 2L, 2L), Gene = c("AQP11", 
"CLNS1A", "RSF1", "CFDP1", "CHST6"), Score = c(0.556650698184967, 
0.281174659729004, 0.526992380619049, 0.418606609106064, 0.539513528347015
)), row.names = c(NA, -5L), class = c("data.table", "data.frame"
))

I've been trying something with dplyr with dplyr::group_by() but I keep getting various errors.

Cettt
  • 11,460
  • 7
  • 35
  • 58
DN1
  • 234
  • 1
  • 13
  • 38

3 Answers3

1

Using dplyr:

> library(dplyr)
> df %>% group_by(loci) %>% filter(Score == max(Score))
# A tibble: 2 x 3
# Groups:   loci [2]
   loci Gene  Score
  <dbl> <chr> <dbl>
1     1 AQP11 0.557
2     2 CHST6 0.540
Karthik S
  • 11,348
  • 2
  • 11
  • 25
1

In data.table:

library(data.table)
setDT(df)
df[, .SD[which.max(Score)], by = loci]

   loci  Gene     Score
1:    1 AQP11 0.5566507
2:    2 CHST6 0.5395135
s_baldur
  • 29,441
  • 4
  • 36
  • 69
1

A base R option using subset

subset(dt,ave(Score,loci,FUN = max)==Score)

giving

   loci  Gene     Score
1:    1 AQP11 0.5566507
2:    2 CHST6 0.5395135

Another base R option using aggregate

aggregate(.~loci,dt[with(dt,order(-Score,loci)),],head,1)

giving

  loci  Gene             Score
1    1 AQP11 0.556650698184967
2    2 CHST6 0.539513528347015
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81