1

I have a dataset as below and want to reduce it by removing rows. In situation where I have more than 1 ATTACH value for the same exact combination of first three columns, I want to keep only rows that have maximum ATTACH value

BOM = c(rep("BOM1", 1), rep("BOM2", 2), rep("BOM3", 3))
PART = c(rep("A", 3), rep("D", 3))
WW = c(rep("WW01",3),rep("WW05",1),rep("WW06",2))
ATTACH = c(1,4,8,2,2,4)


df1 = data.frame(BOM,PART,WW,ATTACH )

The final output will have row numbers 1,3,4,6 and remaining rows will get deleted

GabStacker
  • 25
  • 4

2 Answers2

1

You can do that easily with dplyr:

library(dplyr)
df1 %>% group_by(BOM, PART, WW) %>% summarise(ATTACH = max(ATTACH))
Source: local data frame [4 x 4]
Groups: BOM, PART

   BOM PART   WW ATTACH
1 BOM1    A WW01      1
2 BOM2    A WW01      8
3 BOM3    D WW05      2
4 BOM3    D WW06      4
Carlos Cinelli
  • 11,354
  • 9
  • 43
  • 66
0

You first want to split the data frame by the combination of the first three rows and then look at the maximal values for the ATTACH column within each split. Here is a solution (Not sure if order of the rows matters or is maintained here necessarily).

do.call(rbind.data.frame, lapply(
    split(df1, apply(df1[,1:3], 1, paste, collapse=" ")),
    function(x) if(nrow(x) == 1){x}
                else{x[which.max(x[,4]),]}))
cr1msonB1ade
  • 1,716
  • 9
  • 14