1

I want to filer the data frame to remove rows that occure with similar names in col0. I two or more similar names occur, I want to keep the row with highest values in col1.

col0              col1     col2      col3      col4          col4          col5
hsa-let-7a-5p   2.487304 15.04636  8.400422 1.702870e-10 1.084728e-07 13.867065
hsa-let-7a-5p   2.491626 13.70345  7.414093 4.002913e-09 1.274928e-06 10.808433
hsa-let-7d-5p   3.074776 11.36059  6.799401 2.977052e-08 6.321274e-06  8.887774
hsa-miR-7d-5p   3.123776 11.84145  6.210222 2.069015e-07 3.050719e-05  7.032421
hsa-miR-122-5p  -2.521427 13.91681 -6.132486 2.673240e-07 3.050719e-05  6.703794
hsa-miR-122-5p  2.602304 11.53867  6.083099 3.145797e-07 3.050719e-05  6.636385

In my example I want to keep row2,row4 and row6. Any tips on function?

user2300940
  • 2,355
  • 1
  • 22
  • 35

1 Answers1

1

Assuming that it is a data.frame, then it cannot have duplicated row names. So, either it must be a matrix or it could be the first column of data.frame. By assuming that, grouped by the first column i.e. 'col0', slice the row with the maximum value in 'col1'

 library(dplyr)
 df1 %>%
    group_by(col0) %>%
    slice(which.max(col1))
akrun
  • 874,273
  • 37
  • 540
  • 662