2

I'm trying to get duplicate values within groups. More specifically I want to check duplicate Child ID within each household, but duplicate Child ID across different households are fine. For example, I have a data frame called "village":

village <- data.frame(household = c(1, 1, 1, 2, 3, 3), 
                      children = c("A001", "A002", "A001", "A004", "A005", "A001"))

I want to get an output of:

expected <- data.frame(household = c(1, 1), 
                       children = c("A001", "A001"))
Karen Liu
  • 115
  • 1
  • 8

2 Answers2

3

We can group by 'household', 'children' and filter the rows where the number of rows is greater than 1

library(dplyr)
village %>% 
   group_by(household, children) %>% 
   filter(n() > 1) %>%
   ungroup

-output

# A tibble: 2 x 2
#  household children
#      <dbl> <chr>   
#1         1 A001    
#2         1 A001   

Or using base R with duplicated

village[duplicated(village)|duplicated(village, fromLast = TRUE),]
#  household children
#1         1     A001
#3         1     A001 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    The group_by() answer is exactly what I need since I have more than those two columns in the actual dataset. Thank you! – Karen Liu Oct 03 '20 at 22:35
  • @KarenLiu you can use `duplicated` also if you subset i.e. `village[duplicated(village[1:2])|duplicated(village[1:2], fromLast = TRUE),]` – akrun Oct 03 '20 at 22:36
2

Another base R option using subset + ave

> subset(village,ave(household,household,children,FUN = length)>1)
  household children
1         1     A001
3         1     A001
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81