I am working with a table which is an extract of a set of other tables. All of the rows of the extract table should be unique according to keys D1, D2 and D3. They are not. It appears that an earlier developer attempted to solve this problem by using a SELECT DISTINCT
across all columns being queried from this table. This will work, but only if every row which is a duplicate on (D1, D2, D3) is also a duplicate across the non-key columns (ignoring the IDENTITY column that was added to the extract table).
In other words, given rows as follows:
D1 D2 D3 C4 C5 C6
=== === === === === ===
A B C X1 X2 X3
A B C X1 X2 X3
then
SELECT DISTINCT D1, D2, D3, C4, C5, C6
FROM BAD_TABLE
will "work", as there's no difference between the rows which are duplicated on (D1,D2,D3). But if the table contained
D1 D2 D3 C4 C5 C6
=== === === === === ===
A B C X1 X2 X3
A B C X1 X2 X4
then SELECT DISTINCT would return two rows for the key (A,B,C). Furthermore, we would have to decide which of X3 or X4 was the "correct" value.
I know how to find the duplicates on (D1,D2,D3). I even know how to find the duplicates across all the columns (other than the IDENTITY column):
;
WITH DUPLICATES(D1,D2,D3) AS
(
SELECT D1, D2, D3
FROM SOURCE
GROUP BY D1, D2, D3
HAVING COUNT(*)>1
)
SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
FROM SOURCE S
INNER JOIN DUPLICATES D
ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3
ORDER BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6
The question is, how do I find the subset of the above resultset which are duplicates on (D1,D2,D3), but not duplicates on (D1,D2,D3,C4,C5,C6)?