I want to find an efficient method of determining an entire hierarchy type relationship for a table of number pairs, then express that relationship in a vector, or string, so that I can determine other useful information about each pair's hierarchy, such as the highest related integer, lowest related integer and total number of related integers.
For example I have a table of integer pairs:
X Y
--- ---
5 10
5 11
11 12
11 13
13 3
20 18
17 18
50 18
20 21
A record is related to another record if any value in the pair is shared by any other value in another pair. The final table would look something like this:
X Y Related ID's
--- --- ---------------
5 10 3,5,10,11,12,13
5 11 3,5,10,11,12,13
11 12 3,5,10,11,12,13
11 13 3,5,10,11,12,13
13 3 3,5,10,11,12,13
20 18 17,18,20,21,50
17 18 17,18,20,21,50
50 18 17,18,20,21,50
20 21 17,18,20,21,50
What I have now is admittedly a mess. It uses a fuzzy_join with a matching function that takes x,y as a vector and does a match between them. That match then creates a larger vector of all four matching numbers, which goes back into the fuzzy_join to do the match again. This loops until there are no more matches. It gets terrible very quickly, and at about 4k records it just doesn't respond anymore. The entire initial table of pairs will stay < 100k records