Question: How do I efficiently get a list of people (#People table below, one person per row) that are matched as a set (#Keys table below), but also make sure that the sets are unique.
Background: I'm working with sets of matches in a database (in the form of KeyId, PersonId1, PersonId2). We have an automatic method that flags people as duplicates and writes them to a Match table. There aren't too many, but we usually sit with about 100K match records and 200K people. The match process has also added duplicate records in the form of matching Person 1 with 2, and also matching 2 with 1. Also, we logically delete people records (IsDeleted = 1) so do not want to return matches where one person has already been deleted.
We have an administration screen where people can look at the duplicates and flag whether they aren't dupes, or delete one of the pair. There has been a problem where even if one person in the pair was deleted, the other one was still showing in the list. The SQL below is an attempt to make sure that only people that exist as a set are returned.
Test Data Setup:
CREATE TABLE #Keys
(
KeyId int PRIMARY KEY,
PersonId1 int,
PersonId2 int
)
CREATE TABLE #People
(
PersonId int PRIMARY KEY,
Name varchar(150),
IsDeleted bit
)
INSERT INTO #People
VALUES (1, 'John',0),
(2, 'Madeline',0),
(3, 'Ralph',1),
(4, 'Sarah',0),
(5, 'Jack',0),
(6, 'Olivia',0),
(7, 'Ethan',0),
(8, 'Sophia',0)
INSERT INTO #Keys
VALUES (1,1,2),
(2,2,3),
(3,1,3),
(4,2,1),
(5,4,8),
(6,3,7),
(7,6,1)
SELECT *
FROM #Keys k
JOIN #People p1
ON k.PersonId1 = p1.PersonId
AND p1.IsDeleted = 0
JOIN #People p2
ON k.PersonId2 = p2.PersonId
AND p2.IsDeleted = 0
Returns:
KeyId PersonId1 PersonId2 PersonId Name IsDeleted PersonId Name IsDeleted
1 1 2 1 John 0 2 Madeline 0
4 2 1 2 Madeline 0 1 John 0
5 4 8 4 Sarah 0 8 Sophia 0
7 6 1 6 Olivia 0 1 John 0
SELECT KeyId, p1.PersonId, p1.Name
INTO #Results
FROM #Keys k
JOIN #People p1
ON k.PersonId1 = p1.PersonId
AND p1.IsDeleted = 0
JOIN #People p2
ON k.PersonId2 = p2.PersonId
AND p2.IsDeleted = 0
INSERT INTO #Results
SELECT KeyId, p2.PersonId, p2.Name
FROM #Keys k
JOIN #People p1
ON k.PersonId1 = p1.PersonId
AND p1.IsDeleted = 0
JOIN #People p2
ON k.PersonId2 = p2.PersonId
AND p2.IsDeleted = 0
SELECT * from #Results
order by KeyId
DROP TABLE #People
DROP TABLE #Keys
DROP TABLE #Results
The final query returns this set:
KeyId PersonId Name
1 2 Madeline
1 1 John
4 2 Madeline
4 1 John
5 8 Sophia
5 4 Sarah
7 6 Olivia
7 1 John
But it has the problem that Keys 1 and 4 have the same people, just reversed in order. The set I'd like returned is:
KeyId PersonId Name
1 2 Madeline
1 1 John
5 4 Sarah
5 8 Sophia
7 1 John
7 6 Olivia