Drop rows which are duplicates regarding certain columns

Question

I want to identify and remove observations which are duplicates in certain aspects.

In my example, I want to get rid of rows 1 and 6, as they are the same in both V1 and V2. That they differ in V3 shouldn't matter.

df <- data.frame(V1 = c("a","b","c","a","c","a"),
                 V2 = c(1,2,1,2,3,1),
                 V3 = c(1,2,3,4,5,6))

Applying dplyr::distinct(df, V1, V2) results in row 6 being discarded while row 1 remains. As I said, I want both rows 1 and 6 removed. I am sure the problem is trivial, but I can't think of the correct search terms ...

Thanks!

```df[!(duplicated(df[c(1,2)]) | duplicated(df[c(1,2)], fromLast = TRUE)), ]``` — M--, Feb 25 '23 at 07:37

score 2 · Accepted Answer · answered Feb 24 '23 at 23:32

2

We can group-by then filter:

group_by(df, V1, V2) %>%
  filter(n() == 1) %>%
  ungroup()
# # A tibble: 4 × 3
#   V1       V2    V3
#   <chr> <dbl> <dbl>
# 1 b         2     2
# 2 c         1     3
# 3 a         2     4
# 4 c         3     5

answered Feb 24 '23 at 23:32

r2evans

141,215
6
77
149

4

Or with dplyr 1.1.0, `filter(df, n() == 1, .by = c(V1, V2))` – Jon Spring Feb 24 '23 at 23:35

score 1 · Answer 2 · answered Feb 25 '23 at 01:11

1

Using data.table

library(data.table)

setDT(df)

df[, .SD[.N == 1], by = .(V1, V2)]

  V1 V2 V3
1:  b  2  2
2:  c  1  3
3:  a  2  4
4:  c  3  5

answered Feb 25 '23 at 01:11

S-SHAAF

1,863
2
5
14

Drop rows which are duplicates regarding certain columns

2 Answers2