I have scraped Google Maps data of businesses with many, many duplicates of both phone number and URL's. I need to create a variable that ID's groups where there is any overlap in phone number or URL, going both ways.
- Any URLs that share a phone number go in the same group.
- Any phone numbers that share a URL go in the same group.
- Recursively.
So in
Phone | URL |
---|---|
111-111-1111 | a |
222-222-2222 | a |
222-222-2222 | b |
333-333-3333 | b |
333-333-3333 | c |
444-444-4444 | c |
444-444-4444 | d |
555-555-5555 | d |
999-999-9999 | z |
There should only be two groups: The first 8 rows - linked by overlapping phone numbers and URL's, and the last row - which has nothing in common for anyone else.
I feel like this is going to end up being a straightforward for loop, but I'm having trouble getting started.