R how to find maximum value by a primary key and removing duplicates

Question

I have a dataset as below and want to reduce it by removing rows. In situation where I have more than 1 ATTACH value for the same exact combination of first three columns, I want to keep only rows that have maximum ATTACH value

BOM = c(rep("BOM1", 1), rep("BOM2", 2), rep("BOM3", 3))
PART = c(rep("A", 3), rep("D", 3))
WW = c(rep("WW01",3),rep("WW05",1),rep("WW06",2))
ATTACH = c(1,4,8,2,2,4)


df1 = data.frame(BOM,PART,WW,ATTACH )

The final output will have row numbers 1,3,4,6 and remaining rows will get deleted

Carlos Cinelli · Accepted Answer · 2015-06-18T23:01:08.273

1

You can do that easily with dplyr:

library(dplyr)
df1 %>% group_by(BOM, PART, WW) %>% summarise(ATTACH = max(ATTACH))
Source: local data frame [4 x 4]
Groups: BOM, PART

   BOM PART   WW ATTACH
1 BOM1    A WW01      1
2 BOM2    A WW01      8
3 BOM3    D WW05      2
4 BOM3    D WW06      4

edited Jun 18 '15 at 23:01

answered Jun 18 '15 at 22:46

Carlos Cinelli

11,354
9
43
66

Thanks, already using it and working nicely! – GabStacker Jun 19 '15 at 16:36

score 0 · Answer 2 · answered Jun 18 '15 at 22:22

You first want to split the data frame by the combination of the first three rows and then look at the maximal values for the ATTACH column within each split. Here is a solution (Not sure if order of the rows matters or is maintained here necessarily).

do.call(rbind.data.frame, lapply(
    split(df1, apply(df1[,1:3], 1, paste, collapse=" ")),
    function(x) if(nrow(x) == 1){x}
                else{x[which.max(x[,4]),]}))

R how to find maximum value by a primary key and removing duplicates

2 Answers2