Subset rows which are not duplicates

Question

I have a data set (data.frame) composed mainly by duplicated rows. I´d like to subset it in order to get only the rows which didn´t have duplicates originally. Example of original df:

df1 <- data.frame( ID = c("1", "1","2","2","3","3","4","4"), V_1 = c(2011,2011,NA,NA,1,10,7,7), V_2= c(5,5,6,6,15,15,8,2))

Answer I´d like to get:

df2 <- data.frame( ID = c("3","3","4","4"), V_1 = c(1,10,7,7), V_2= c(15,15,8,2))

Using unique(df1) function gives me the complete list of unique rows (in the case of the example, 6). Since my df has more than 200k rows, it is not adequate for me.

I have searched for similar questions: Subset with unique cases, based on multiple columns, Filter rows based on multiple column conditions R, but none of them helped me.

score 2 · Accepted Answer · answered Nov 19 '15 at 12:35

2

Here you go

duplicated.i = duplicated(df1) | (duplicated(df1[nrow(df1):1,]))[nrow(df1):1]
df2 = df1[!duplicated.i,]

# ID V_1 V_2
# 5  3   1  15
# 6  3  10  15
# 7  4   7   8
# 8  4   7   2

answered Nov 19 '15 at 12:35

score 1 · Answer 2 · answered Nov 19 '15 at 13:07

1

We can remove the ID's of the duplicated rows and exclude them from the entire data frame

df1[!df1$ID %in% df1[duplicated(df1),1 ], ]

#  ID V_1 V_2
#5  3   1  15
#6  3  10  15
#7  4   7   8
#8  4   7   2

answered Nov 19 '15 at 13:07

Ronak Shah

377,200
20
156
213

Subset rows which are not duplicates

2 Answers2