I'm using SQL Server 2012 version 11.0.6020.0. Apologies in advance, I'm new to SQL.
I have one table for a person ID. Due to duplication, a person can have multiple ID's. In an attempt to clean this up, a master ID is created. However, there still exists duplications. Currently, it looks like this ...
IF OBJECT_ID('tempdb..#table1') IS NOT NULL
BEGIN
DROP TABLE #table1
END
CREATE TABLE #table1 (MasterID varchar(1), PersonID1 varchar(3), PersonID2 varchar(3), PersonID3 varchar(3), PersonID4 varchar(3), PersonID5 varchar(3))
INSERT INTO #table1 VALUES ('A', '12', '34', '56', '78', null);
INSERT INTO #table1 VALUES ('B', '34', '12', '90', null, null);
INSERT INTO #table1 VALUES ('C', '777', '888', null, null, null);
The table looks like this when the code above is executed.
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | |
| B | 34 | 12 | 90 | | |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
MasterID A and MasterID B is the same person because some of the PersonID overlap. MasterID C is a different person because it shares none of the ID's. If one ID is shared, then it's safe for me to assume that it is the same patient. So the output I want is ...
+----------+-----------+-----------+-----------+-----------+-----------+
| MasterID | PersonID1 | PersonID2 | PersonID3 | PersonID4 | PersonID5 |
+----------+-----------+-----------+-----------+-----------+-----------+
| A | 12 | 34 | 56 | 78 | 90 |
| C | 777 | 888 | | | |
+----------+-----------+-----------+-----------+-----------+-----------+
I thought about unpivoting the data and grouping it.
IF OBJECT_ID('tempdb..#t1') IS NOT NULL
BEGIN
DROP TABLE #t1
END
SELECT MasterID, PersonID
INTO #t1
FROM
(
SELECT MasterID, PersonID1, PersonID2, PersonID3, PersonID4, PersonID5
FROM #table1
) t1
UNPIVOT
(
PersonID FOR PersonIDs IN (PersonID1, PersonID2, PersonID3, PersonID4, PersonID5)
) AS up
GO
---------------------------------------------------
SELECT min(MasterID) as MasterID, PersonID
FROM #t1
GROUP BY PersonID
ORDER BY 1, 2
However, this solution will leave me with this below where it looks like 90 is its own person.
+----------+-----------+
| MasterID | PersonID |
+----------+-----------+
| A | 12 |
| A | 34 |
| A | 56 |
| A | 78 |
| B | 90 |
| C | 777 |
| C | 888 |
+----------+-----------+
I looked through stack overflow and the closest solution I found is this but it involves two tables whereas mine is within the same table SQL UPDATE SET one column to be equal to a value in a related table referenced by a different column?
I also found this but the max aggregate function probably won't work for my case. Merge two rows in SQL
This solution looks like it'll work but it'll require me to manually check each field for duplicate PersonID first before updating my MasterID. set a row equal to another row in the same table apart the primary key column
My goal is to have SQL check for duplicates and if found, remove the duplicates and update add the new PersonID. And as for which masterID to use, it doesn't matter whether I keep A or B.
Let me know if you know of any solutions or can direct me to one. I'm new to SQL so I may be searching the wrong keywords and vocabularies. Thanks, I really appreciate it!