1

I have a dataframe that looks like this:

  COLA       COLB     COLC
    A         nb       1
    A         nc       0.8
    A         bc       0.7
    A         nb       0.7  <------------
    B         nb       1
    B         nc       0.3  <------------
    B         nc       0.8
    B         aa       0.9

I want to remove the duplicates in COLB by COLA unique ID and keep the maximum value of that duplicate from COLC.

So I want the final result to look like this ( pointed to the rows I want to delete in the previous table):

  COLA       COLB     COLC
    A         nb       1
    A         nc       0.8
    A         bc       0.7
    B         nb       1
    B         nc       0.8
    B         aa       0.9
nak5120
  • 4,089
  • 4
  • 35
  • 94
  • In addition to the one in the dupe tagged, a base R approach is `df1[!duplicated(df1[c(1,3)]),]` – akrun Sep 14 '16 at 15:04
  • Another possible duplicate: http://stackoverflow.com/questions/2626567/collapsing-data-frame-by-selecting-one-row-per-group – Jaap Sep 14 '16 at 15:06
  • How do you extract the maximum value though in column C for that duplicated row in COLA and COLB before removing the duplicates so you remove the correct row? @akrun – nak5120 Sep 14 '16 at 15:08

1 Answers1

2

We can use dplyr. After arrangeing the 'COLA", and descendingly the 'COLC', we group by 'COLA', 'COLB' and get the first row with slice.

library(dplyr)
df1 %>%
   arrange(COLA, desc(COLC)) %>% 
   group_by(COLA, COLB) %>% 
   slice(1L) 
akrun
  • 874,273
  • 37
  • 540
  • 662