0

I have scraped Google Maps data of businesses with many, many duplicates of both phone number and URL's. I need to create a variable that ID's groups where there is any overlap in phone number or URL, going both ways.

  • Any URLs that share a phone number go in the same group.
  • Any phone numbers that share a URL go in the same group.
  • Recursively.

So in

Phone URL
111-111-1111 a
222-222-2222 a
222-222-2222 b
333-333-3333 b
333-333-3333 c
444-444-4444 c
444-444-4444 d
555-555-5555 d
999-999-9999 z

There should only be two groups: The first 8 rows - linked by overlapping phone numbers and URL's, and the last row - which has nothing in common for anyone else.

I feel like this is going to end up being a straightforward for loop, but I'm having trouble getting started.

Henrik
  • 65,555
  • 14
  • 143
  • 159
SDYockey
  • 23
  • 3

0 Answers0