I have a table with two columns of IDs, like so:
╔════════╦══════╗
║ Master ║ Dupe ║
╠════════╬══════╣
║ 2 ║ 7 ║
║ 3 ║ 6 ║
║ 6 ║ 7 ║
║ 20 ║ 25 ║
║ 75 ║ 25 ║
╚════════╩══════╝
Each row represents the IDs of two rows in a sql table that are deemed to be duplicates of each other.
This table can contain many thousands of entries, with no guarantees for the data other than the Master
column being sorted in ascending order as pictured. Either column could contain the same ID as the other column, potentially against different or the same partner ID. Again - no guarantees.
From this table I would like get an index of Master and all of its possible dupes. Like pictured below.
Desired outcomes:
- Lowest ID should be kept as master
- All subsequent dupes of a dupe should map back to the same (lowest ID) master
For the above, the desired output would look like so (but the columns don't HAVE to be sorted):
╔════════╦══════╗
║ Master ║ Dupe ║
╠════════╬══════╣
║ 2 ║ 3 ║
║ 2 ║ 6 ║
║ 2 ║ 7 ║
║ 20 ║ 25 ║
║ 20 ║ 75 ║
╚════════╩══════╝
I'm finding it difficult to explain this problem so my googling has not returned much. I'm thinking there must be an algorithm somewhere for iterating through a list of tuples like this and discovering duplication.
Any help appreciated!
EDIT: I've modified the example tables to better explain what their contents might look like.
Some notes to consider,
- There is no guarantee of a chain. It could all be one big chain, lots of small ones or none at all.
- There is no guarantee that all pairs appear in reverse order somewhere else in the table
From what I can see, the problem looks to be recursive, I think LukStorms is on the right track but I can't quite figure it out
ANSWERED: While both solutions below from @artm and @LukStorms seem to work, I found the latter to be a little more succinct and readable. Thank you both! Fantastic help on a tough question. I only wish I could award the answer to you both