filter duplicates from a data frame in r

Question

I have a dataframe with one observation per row and two observations per subject. I'd like to filter out just the rows with duplicate 'day' numbers.

ex <- data.frame('id'= rep(1:5,2), 'day'= c(1:5, 1:3,5:6))

The following code filters out just the second duplicated row, but not the first. Again, I'd like to filter out both of the duplicated rows.

ex %>% 
    group_by(id) %>% 
    filter(duplicated(day))

The following code works, but seems clunky. Does anyone have a more efficient solution?

ex %>% 
    group_by(id) %>% 
    filter(duplicated(day, fromLast = TRUE) | duplicated(day, fromLast = FALSE))

score 16 · Accepted Answer · answered Nov 04 '16 at 19:53

duplicated can be applied on the whole dataset and this can be done with just base R methods.

ex[duplicated(ex)|duplicated(ex, fromLast = TRUE),]

Using dplyr, we can group_by both the columns and filter only when the number of rows (n()) is greater than 1.

ex %>% 
     group_by(id, day) %>%
     filter(n()>1)

score 3 · Answer 2 · answered Nov 05 '16 at 08:15

3

Single tidyverse pipe:

exSinglesOnly <- 
    ex %>% 
    group_by(id,day) %>% # the complete group of interest
    mutate(duplicate = n()) %>% # count number in each group
    filter(duplicate == 1) %>% # select only unique records
    select(-duplicate) # remove group count column

> exSinglesOnly
Source: local data frame [4 x 2]
Groups: id, day [4]

     id   day
  <int> <int>
1     4     4
2     5     5
3     4     5
4     5     6

answered Nov 05 '16 at 08:15

leerssej

14,260
6
48
57

Oh, I hadn't seen @Akrun's second answer. This is a duplicate of that then, but I will leave it up to help explain the logic involved with the complete grouping and subsequent filtration. – leerssej Nov 05 '16 at 08:22

filter duplicates from a data frame in r

2 Answers2