How to filter a dataframe based on the duplicated values?

Question

I have this df

df = data.frame(x = c(1,1,2,2,3,4),
                y = LETTERS[1:6] )

The desired output is

  x y
1 1 A
2 1 B
3 2 C
4 2 D

I tried using the filter finction but I haven't got the result I am looking for.

Thanks.

score 3 · Answer 1 · answered Mar 06 '23 at 15:14

3

With base R

> subset(df, duplicated(df$x)| duplicated(df$x, fromLast = TRUE))
  x y
1 1 A
2 1 B
3 2 C

answered Mar 06 '23 at 15:14

Jilber Urbina

58,147
10
114
138

Maël · Accepted Answer · 2023-03-06T15:15:03.283

2

You can use n() by group:

library(dplyr) #1.1.0 needed or above
df %>% 
  filter(n() > 1, .by = x)

  x y
1 1 A
2 1 B
3 2 C
4 2 D

Or, in base R:

subset(df, ave(x, x, FUN = length) > 1)

And in data.table:

setDT(df)[, if(.N > 1) .SD, x]

edited Mar 06 '23 at 15:15

answered Mar 06 '23 at 15:13

Maël

45,206
3
29
67

score 1 · Answer 3 · answered Mar 06 '23 at 15:15

If we group by a single column, all groups with n() > 1 have "duplicated" values for that column

library(dplyr)

df %>%
    group_by(x) %>%
    filter(n()>1) %>%
    ungroup()

# A tibble: 4 × 2
      x y    
  <dbl> <chr>
1     1 A    
2     1 B    
3     2 C    
4     2 D

How to filter a dataframe based on the duplicated values?

3 Answers3