0

I'm using SQL Server 2012 version 11.0.6020.0. Apologies in advance, I'm new to SQL.

I have one table for a person ID. Due to duplication, a person can have multiple ID's. In an attempt to clean this up, a master ID is created. However, there still exists duplications. Currently, it looks like this ...

IF OBJECT_ID('tempdb..#table1') IS NOT NULL
BEGIN
    DROP TABLE #table1
END

CREATE TABLE #table1 (MasterID varchar(1), PersonID1 varchar(3), PersonID2 varchar(3), PersonID3 varchar(3), PersonID4 varchar(3), PersonID5 varchar(3))

INSERT INTO #table1 VALUES ('A', '12', '34', '56', '78', null);
INSERT INTO #table1 VALUES ('B', '34', '12', '90', null, null);
INSERT INTO #table1 VALUES ('C', '777', '888', null, null, null);

The table looks like this when the code above is executed.

+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A        |        12 |        34 |        56 |        78 |           |
| B        |        34 |        12 |        90 |           |           |
| C        |       777 |       888 |           |           |           |
+----------+-----------+-----------+-----------+-----------+-----------+

MasterID A and MasterID B is the same person because some of the PersonID overlap. MasterID C is a different person because it shares none of the ID's. If one ID is shared, then it's safe for me to assume that it is the same patient. So the output I want is ...

+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A        |        12 |        34 |        56 |        78 |        90 |
| C        |       777 |       888 |           |           |           |
+----------+-----------+-----------+-----------+-----------+-----------+

I thought about unpivoting the data and grouping it.

IF OBJECT_ID('tempdb..#t1') IS NOT NULL
BEGIN
    DROP TABLE #t1
END

SELECT MasterID, PersonID
INTO #t1
FROM
(
    SELECT MasterID, PersonID1, PersonID2, PersonID3, PersonID4, PersonID5
    FROM #table1
) t1
UNPIVOT
(
    PersonID FOR PersonIDs IN (PersonID1, PersonID2, PersonID3, PersonID4, PersonID5)
) AS up

GO

---------------------------------------------------

SELECT min(MasterID) as MasterID, PersonID
FROM #t1
GROUP BY PersonID
ORDER BY 1, 2

However, this solution will leave me with this below where it looks like 90 is its own person.

+----------+-----------+
| MasterID | PersonID  |
+----------+-----------+
| A        |        12 |
| A        |        34 |
| A        |        56 |
| A        |        78 |
| B        |        90 |
| C        |       777 |
| C        |       888 |
+----------+-----------+

I looked through stack overflow and the closest solution I found is this but it involves two tables whereas mine is within the same table SQL UPDATE SET one column to be equal to a value in a related table referenced by a different column?

I also found this but the max aggregate function probably won't work for my case. Merge two rows in SQL

This solution looks like it'll work but it'll require me to manually check each field for duplicate PersonID first before updating my MasterID. set a row equal to another row in the same table apart the primary key column

My goal is to have SQL check for duplicates and if found, remove the duplicates and update add the new PersonID. And as for which masterID to use, it doesn't matter whether I keep A or B.

Let me know if you know of any solutions or can direct me to one. I'm new to SQL so I may be searching the wrong keywords and vocabularies. Thanks, I really appreciate it!

Community
  • 1
  • 1
Raymond
  • 7
  • 5

1 Answers1

1

Please try the following query. It adds a MainMasterID column to identify the main MasterID for each record.

select *, 
       (select min(MasterID) 
          from #table1 t2 
         where t1.PersonID1 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
            or t1.PersonID2 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
            or t1.PersonID3 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
            or t1.PersonID4 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
            or t1.PersonID5 in (t2.PersonID1, t2.PersonID2, t2.PersonID3, t2.PersonID4, t2.PersonID5)
         ) AS MainMasterID
 from #table1 t1

/* Sample data output

MasterID PersonID1 PersonID2 PersonID3 PersonID4 PersonID5 MainMasterID
-------- --------- --------- --------- --------- --------- ------------
A        12        34        56        78        NULL      A
B        34        12        90        NULL      NULL      A
C        777       888       NULL      NULL      NULL      C

*/
Wagner DosAnjos
  • 6,304
  • 1
  • 15
  • 29
  • Thanks so much for this! It's simple and elegant for a complex problem. I'm going to put down the remaining steps for others who may have this issue. Feel free to redo mine to make it more efficient. – Raymond Jan 06 '17 at 01:29
  • I ran the script mentioned above but into a table called #table2. Then I unpivoted it to collapse the duplicates and stored it in #table3 SELECT DISTINCT MainMasterID, PersonID INTO #table3 FROM ( SELECT MainMasterID, PersonID1, PersonID2, PersonID3, PersonID4, PersonID5 FROM #table2 ) t1 UNPIVOT ( PersonID FOR PersonIDs IN (PersonID1, PersonID2, PersonID3, PersonID4, PersonID5) ) AS up – Raymond Jan 06 '17 at 01:33
  • Then I pivoted the distinct values back. SELECT * FROM ( SELECT 'rn' + CAST(ROW_NUMBER() OVER(PARTITION BY MainMasterID ORDER BY PersonID) as varchar(max)) as rn, * FROM #table3) a PIVOT (min(PersonID) FOR rn IN (rn1, rn2, rn3, rn4, rn5)) as pvt – Raymond Jan 06 '17 at 01:34
  • I'm lost here, this doesn't remove the duplicate. It just adds a column – S3S Jan 06 '17 at 14:02
  • OP was stuck on how to identify the duplicates, which this answer provides. – Wagner DosAnjos Jan 06 '17 at 14:46
  • thanks wdosanjos and @scsimon . Yes, the solution doesn't remove the duplicates but it helped me establish the relationships between them. I don't know how to format the SQL query in the comment section but I provided the steps to remove duplicates. Basically, unpivoted the data so I'll have two columns, MainMasterID and PersonID. Then I ran a distinct and pivoted it back. It's a multistep process so I feel that it can be more efficient. But that does the job. – Raymond Jan 06 '17 at 17:46