0

I want to identify and remove observations which are duplicates in certain aspects.

In my example, I want to get rid of rows 1 and 6, as they are the same in both V1 and V2. That they differ in V3 shouldn't matter.

df <- data.frame(V1 = c("a","b","c","a","c","a"),
                 V2 = c(1,2,1,2,3,1),
                 V3 = c(1,2,3,4,5,6))

Applying dplyr::distinct(df, V1, V2) results in row 6 being discarded while row 1 remains. As I said, I want both rows 1 and 6 removed. I am sure the problem is trivial, but I can't think of the correct search terms ...

Thanks!

Klaus Peter
  • 125
  • 7
  • ```df[!(duplicated(df[c(1,2)]) | duplicated(df[c(1,2)], fromLast = TRUE)), ]``` – M-- Feb 25 '23 at 07:37

2 Answers2

2

We can group-by then filter:

group_by(df, V1, V2) %>%
  filter(n() == 1) %>%
  ungroup()
# # A tibble: 4 × 3
#   V1       V2    V3
#   <chr> <dbl> <dbl>
# 1 b         2     2
# 2 c         1     3
# 3 a         2     4
# 4 c         3     5
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Using data.table

library(data.table)

setDT(df)

df[, .SD[.N == 1], by = .(V1, V2)]

  V1 V2 V3
1:  b  2  2
2:  c  1  3
3:  a  2  4
4:  c  3  5
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14