Remove Duplicates by Unique Value in another Column

Question

I have a dataframe that looks like this:

  COLA       COLB     COLC
    A         nb       1
    A         nc       0.8
    A         bc       0.7
    A         nb       0.7  <------------
    B         nb       1
    B         nc       0.3  <------------
    B         nc       0.8
    B         aa       0.9

I want to remove the duplicates in COLB by COLA unique ID and keep the maximum value of that duplicate from COLC.

So I want the final result to look like this ( pointed to the rows I want to delete in the previous table):

  COLA       COLB     COLC
    A         nb       1
    A         nc       0.8
    A         bc       0.7
    B         nb       1
    B         nc       0.8
    B         aa       0.9

In addition to the one in the dupe tagged, a base R approach is `df1[!duplicated(df1[c(1,3)]),]` — akrun, Sep 14 '16 at 15:04
Another possible duplicate: http://stackoverflow.com/questions/2626567/collapsing-data-frame-by-selecting-one-row-per-group — Jaap, Sep 14 '16 at 15:06
How do you extract the maximum value though in column C for that duplicated row in COLA and COLB before removing the duplicates so you remove the correct row? @akrun — nak5120, Sep 14 '16 at 15:08

score 2 · Accepted Answer · answered Sep 14 '16 at 15:14

2

We can use dplyr. After arrangeing the 'COLA", and descendingly the 'COLC', we group by 'COLA', 'COLB' and get the first row with slice.

library(dplyr)
df1 %>%
   arrange(COLA, desc(COLC)) %>% 
   group_by(COLA, COLB) %>% 
   slice(1L)

answered Sep 14 '16 at 15:14

akrun

874,273
37
540
662

Remove Duplicates by Unique Value in another Column

1 Answers1