Get duplicate values within groups

Question

I'm trying to get duplicate values within groups. More specifically I want to check duplicate Child ID within each household, but duplicate Child ID across different households are fine. For example, I have a data frame called "village":

village <- data.frame(household = c(1, 1, 1, 2, 3, 3), 
                      children = c("A001", "A002", "A001", "A004", "A005", "A001"))

I want to get an output of:

expected <- data.frame(household = c(1, 1), 
                       children = c("A001", "A001"))

score 3 · Accepted Answer · answered Oct 03 '20 at 22:27

3

We can group by 'household', 'children' and filter the rows where the number of rows is greater than 1

library(dplyr)
village %>% 
   group_by(household, children) %>% 
   filter(n() > 1) %>%
   ungroup

-output

# A tibble: 2 x 2
#  household children
#      <dbl> <chr>   
#1         1 A001    
#2         1 A001

Or using base R with duplicated

village[duplicated(village)|duplicated(village, fromLast = TRUE),]
#  household children
#1         1     A001
#3         1     A001

answered Oct 03 '20 at 22:27

akrun

874,273
37
540
662

1

The group_by() answer is exactly what I need since I have more than those two columns in the actual dataset. Thank you! – Karen Liu Oct 03 '20 at 22:35
@KarenLiu you can use `duplicated` also if you subset i.e. `village[duplicated(village[1:2])|duplicated(village[1:2], fromLast = TRUE),]` – akrun Oct 03 '20 at 22:36

score 2 · Answer 2 · answered Oct 03 '20 at 22:32

2

Another base R option using subset + ave

> subset(village,ave(household,household,children,FUN = length)>1)
  household children
1         1     A001
3         1     A001

answered Oct 03 '20 at 22:32

ThomasIsCoding

96,636
9
24
81

Get duplicate values within groups

2 Answers2