Filter duplicates based on condition in another column

Question

I would like to filter my data frame to remove duplicated IDs in "Gene" and keep only the one with lowest "Pval". Please see my example:

in

Gene Pval
buc  0.01
buc  0.3
abad 0.0002
abad 0.01
myc  0.1
p53  0.03

out

Gene Pval
buc  0.01
abad 0.0002
myc  0.1
p53  0.03

what is the criteria with which you pick the Pval out of the possible options, eg. for gene puc 0.01 vs 0.3? looks like you need something like `dplyr::group_by(df, Gene) %>% dplyr::summarise(Pval = min(Pval, na.rm = T)) %>% dplyr::ungroup()` — Probel, Jun 24 '19 at 12:51

NelsonGon · Answer 1 · 2019-06-24T12:59:46.277

1

We can use:

library(dplyr)

df %>% 
  group_by(Gene) %>% 
  filter(Pval==min(Pval)) %>% 
  unique()

edited Jun 24 '19 at 12:59

answered Jun 24 '19 at 12:54

NelsonGon

1 Answers1