I have a table with three columns: Surname, First Name and Address. I'm looking to match up families by searching for people with the same surname AND the same address. I figured out how to use Duplicated to filter down to only people with the same name OR the same address. Here's my sample table:
Surname First Name Address
A1 Bobby X1
B5 Joe X2
B5 Mary X3
F2 Lou X4
F3 Sarah X5
G4 Bobby X6
G4 Fred X6
G4 Anna X6
H5 Eric X7
K6 Peter X8
And the code I used to filter it is:
duplicates = duplicated(sample$Surname)
sample_surnames= sample %>% filter(duplicates)
Here's the output of that code:
Surname First Name Address
B5 Mary X3
G4 Fred X6
G4 Anna X6
The problem is two fold:
- This code drops the first instance of any duplicate. i.e. Bobby, Fred and Anna should all be included but Bobby is dropped.
- Is there a way to filter for duplicates in both the Surname and Address columns at once or do I need to perform the operation twice? To be clear: I'm looking for instances where there's a duplicate in BOTH columns.
Update: Here is the table I'd like to get in the end: Please note, I'm not trying to remove the duplicates but, rather, keep the duplicates. In this case, Bobby, Fred and Anna are the only ones who have both the same Surname and Address.
Surname First Name Address
G4 Bobby X6
G4 Fred X6
G4 Anna X6