Matching records across multiple possible IDs

Question

I have multiple records with sparsely populated identifiers (I will call these ID Numbers). I can have a maximum of two different ID Numbers per record and want to be able to traverse all the related records together so that I can create a single shared identifier. I want to achieve this in a T-SQL query.

Essentially, here is some sample data:

+-------+-------+--------+-----+------+
| RowId |  ID1  |  ID2   | ID3 | ID4  |
+-------+-------+--------+-----+------+
|     1 | 11111 |        |     |      |
|     2 | 11111 |        |     |      |
|     3 | 11111 | AAAAA  |     |      |
|     4 |       | BBBBBB | BC1 |      |
|     5 |       |        | BC1 | O111 |
|     6 |       | GGGGG  | BC1 |      |
|     7 |       | AAAAA  |     | O111 |
|     8 |       | CCCCCC |     |      |
|     9 | 99999 |        |     |      |
|    10 | 99999 | DDDDDD |     |      |
|    11 |       |        |     | O222 |
|    12 |       | EEEEEE |     | O222 |
|    13 |       | EEEEEE |     | O333 |
+-------+-------+--------+-----+------+

So for example, 11111 is linked to AAAAA in RowId3, and AAAAA is also linked to O111 in rowId 7. O111 is linked to BC1 in RowId 5. BC1 is linked to BBBBBB in RowId 4, etc. Also, I want to create a new single identifier once all of these rows are linked.

Here is the output I want to achieve for all of the data above:

Denormalised:
+---------+-------+--------+-----+------+
| GroupId |  ID1  |  ID2   | ID3 | ID4  |
+---------+-------+--------+-----+------+
|       1 | 11111 | AAAAA  | BC1 | O111 |
|       1 | 11111 | BBBBBB | BC1 | O111 |
|       1 | 11111 | GGGGG  | BC1 | O111 |
|       2 |       | CCCCCC |     |      |
|       3 | 99999 | DDDDDD |     |      |
|       4 |       | EEEEEE |     | O222 |
|       4 |       | EEEEEE |     | O333 |
+---------+-------+--------+-----+------+


Normalized (probably better to work with): 

+--------+----------+---------+
| IDType | IDNumber | GroupId |
+--------+----------+---------+
| ID1    | 11111    |       1 |
| ID2    | AAAAA    |       1 |
| ID2    | BBBBBB   |       1 |
| ID2    | GGGGG    |       1 |
| ID3    | BC1      |       1 |
| ID4    | O111     |       1 |
| ID2    | CCCCCC   |       2 |
| ID1    | 99999    |       3 |
| ID2    | DDDDDD   |       3 |
| ID2    | EEEEEE   |       4 |
| ID4    | O222     |       4 |
| ID4    | O333     |       4 |
+--------+----------+---------+

I am looking for SQL code to generate the output above or similar normalized structure. Thanks.

EDIT: Here is some code to create data that matches the sample data in the table above.

DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
    (
        RowId   INT,
        ID1 VARCHAR(100),
        ID2 VARCHAR(100),
        ID3 VARCHAR(100),
        ID4 VARCHAR(100)
    )

INSERT INTO #ID VALUES 
    (1,'11111',NULL,NULL,NULL),
    (2,'11111',NULL,NULL,NULL),
    (3,'11111','AAAAA',NULL,NULL),
    (4,NULL,'BBBBBB','BC1',NULL),
    (5,NULL,NULL,'BC1','O111'),
    (6,NULL,'GGGGG','BC1',NULL),
    (7,NULL,'AAAAA',NULL,'O111'),
    (8,NULL,'CCCCCC',NULL,NULL),
    (9,'99999',NULL,NULL,NULL),
    (10,'99999','DDDDDD',NULL,NULL),
    (11,NULL,NULL,NULL,'O222'),
    (12,NULL,'EEEEEE',NULL,'O222'),
    (13,NULL,'EEEEEE',NULL,'O333')

Good question. I can see doing this in PostgreSQL, since it implements `UNION` (in addition to `UNION ALL`) in recursive CTEs. — The Impaler, Aug 03 '19 at 15:01
`11111` and `BBBBBB` are never linked in the sample data. Same as `11111` and `GGGGG`. I think second and third row in the expected result should have empty `ID1`. Your problem looks similar to [How to find all connected subgraphs of an undirected graph](https://stackoverflow.com/questions/35254260/how-to-find-all-connected-subgraphs-of-an-undirected-graph) — Vladimir Baranov, Aug 03 '19 at 15:08
@VladimirBaranov: 11111 and BBBBBB are linked by BC1 at row 4, because BC1 is linket to O111 at row 5, which is linked to AAAAA in row 7, which is linked to 11111 in row 3. The same applies to GGGGG — Antonio Veneroso Contreras, Aug 03 '19 at 15:33
@SQL-Fan . . . Can you set up a db<>fiddle? Or at least express the data as `insert` statements. — Gordon Linoff, Aug 03 '19 at 16:12

Vladimir Baranov · Answer 1 · 2019-08-09T02:06:34.680

It is easy to get your normalized output.

I'm using my query from How to find all connected subgraphs of an undirected graph with minor modification to convert your data into pairs that define edges of a graph. The query treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.

Your source table has four IDs, but each row can have only two IDs, so we know that each row has a pair of IDs. My query expects this kind of data (pairs of IDs). It is easy to convert four IDs into a pair - use COALESCE.

For detailed explanation of how it works, see How to find all connected subgraphs of an undirected graph.

Query

WITH
CTE_Idents
AS
(
    SELECT ID1 AS Ident, 'ID1' AS IDType
    FROM @T

    UNION

    SELECT ID2 AS Ident, 'ID2' AS IDType
    FROM @T

    UNION

    SELECT ID3 AS Ident, 'ID3' AS IDType
    FROM @T

    UNION

    SELECT ID4 AS Ident, 'ID4' AS IDType
    FROM @T
)
,CTE_Pairs
AS
(
    SELECT COALESCE(ID1, ID2, ID3, ID4) AS Ident1, COALESCE(ID4, ID3, ID2, ID1) AS Ident2
    FROM @T

    UNION

    SELECT COALESCE(ID4, ID3, ID2, ID1) AS Ident1, COALESCE(ID1, ID2, ID3, ID4) AS Ident2
    FROM @T
)
,CTE_Recursive
AS
(
    SELECT
        CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent 
        , Ident1
        , Ident2
        , CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
SELECT
    CTE_Idents.IDType
    ,CTE_Idents.Ident
    ,CASE WHEN CA_Data.XML_Value IS NULL 
    THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
    ,DENSE_RANK() OVER(ORDER BY 
        CASE WHEN CA_Data.XML_Value IS NULL 
        THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
    ) AS GroupID
FROM
    CTE_Idents
    CROSS APPLY
    (
        SELECT CTE_CleanResult.Ident+','
        FROM CTE_CleanResult
        WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
        ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
    ) AS CA_XML(XML_Value)
    CROSS APPLY
    (
        SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
    ) AS CA_Data(XML_Value)
WHERE
    CTE_Idents.Ident IS NOT NULL
ORDER BY GroupID, IDType, Ident;

Result

+--------+--------+------------------------------------+---------+
| IDType | Ident  |            GroupMembers            | GroupID |
+--------+--------+------------------------------------+---------+
| ID1    | 11111  | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID2    | AAAAA  | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID2    | BBBBBB | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID2    | GGGGG  | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID3    | BC1    | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID4    | O111   | 11111,AAAAA,BBBBBB,BC1,GGGGG,O111, |       1 |
| ID1    | 99999  | 99999,DDDDDD,                      |       2 |
| ID2    | DDDDDD | 99999,DDDDDD,                      |       2 |
| ID2    | CCCCCC | CCCCCC,                            |       3 |
| ID2    | EEEEEE | EEEEEE,O222,O333,                  |       4 |
| ID4    | O222   | EEEEEE,O222,O333,                  |       4 |
| ID4    | O333   | EEEEEE,O222,O333,                  |       4 |
+--------+--------+------------------------------------+---------+

This is how your data looks like as a graph:

I rendered this image using DOT from https://www.graphviz.org/.

How to convert this nomalized output into denormalized? One way is to unpivot it using the help of IDType, though it might get tricky if the graph can have several loops. You'd better ask another question specifically about converting nomalized dataset into denormalized.

Bounty rewarded for being the only answer to get the desired results. This solution is quite cumbersome and resembles in many ways my attempts to solve this problem. I was hoping for a simpler solution but I'm guessing it is what it is. — Zohar Peled, Aug 11 '19 at 07:59
You are correct, this solution is closer than mine and avoiding some unnecessary steps, upvote from my side. — Shnugo, Aug 13 '19 at 11:25

score 1 · Answer 2 · answered Aug 06 '19 at 17:00

Well, this was a real brain twister ;-) and my solution is just close... Try this:

General remarks:

I do not think, that T-SQL is the right tool for this...
This structure is open to deeply nested chains. Although there are only 4 IDs, the references can lead to unlimited depth, circles and loops
This is - in a way - a gaps and island issue

The query

WITH cte AS
(
    SELECT RowId
          ,A.ID
          ,A.sourceId
          ,ROW_NUMBER() OVER(PARTITION BY RowId ORDER BY A.SourceId) AS IdCounter
    FROM #ID
    CROSS APPLY (VALUES('ID1',ID1),('ID2',ID2),('ID3',ID3),('ID4',ID4)) A(sourceId,ID)
    WHERE A.ID IS NOT NULL
)
,AllIDs AS
(
    SELECT RowId
          ,MAX(CASE WHEN IdCounter=1 THEN ID END) AS FirstId
          ,MAX(CASE WHEN IdCounter=1 THEN sourceId END) AS FirstSource
          ,MAX(CASE WHEN IdCounter=2 THEN ID END) AS SecondId
          ,MAX(CASE WHEN IdCounter=2 THEN sourceId END) AS SecondSource
    FROM cte
    GROUP BY RowId
)
,recCTE AS
(
    SELECT RowId
          ,FirstId
          ,FirstSource
          ,SecondId
          ,SecondSource 
          ,CAST(N'|' + FirstId AS NVARCHAR(MAX)) AS RunningPath
    FROM AllIDs WHERE SecondId IS NULL
    UNION ALL
    SELECT ai.RowId
          ,ai.FirstId
          ,ai.FirstSource
          ,ai.SecondId
          ,ai.SecondSource
          ,r.RunningPath + CAST(N'|' + ai.FirstId AS NVARCHAR(MAX))
    FROM AllIDs ai
    INNER JOIN recCTE r ON ai.RowId<>r.RowId AND (ai.FirstId=r.FirstId OR ai.FirstId=r.SecondId OR ai.SecondId=r.FirstId OR ai.SecondId=r.SecondId )
    WHERE r.RunningPath NOT LIKE CONCAT('%|',ai.FirstId,'|%') 
)
,FindIslands AS
(
    SELECT FirstId
          ,FirstSource
          ,SecondId
          ,SecondSource
          ,CONCAT(CanonicalPath,'|') AS CanonicalPath
    FROM recCTE 
    CROSS APPLY(SELECT CAST('<x>' + REPLACE(CONCAT(RunningPath,'|',SecondId),'|','</x><x>') + '</x>' AS XML)) A(Casted)
    CROSS APPLY(SELECT Casted.query('
                        for $x in distinct-values(/x[text()])
                        order by $x
                        return <x>{concat("|",$x)}</x>
                        ').value('.','nvarchar(max)')) B(CanonicalPath)
)
,MaxPaths AS
(
    SELECT fi.CanonicalPath
          ,x.CanonicalPath AS BestPath
          ,LEN(x.CanonicalPath) AS PathLength
          ,ROW_NUMBER() OVER(PARTITION BY fi.CanonicalPath ORDER BY LEN(x.CanonicalPath) DESC) AS SortIndex 
    FROM FindIslands fi
    INNER JOIN FindIslands x ON LEN(x.CanonicalPath)>=LEN(fi.CanonicalPath) AND x.CanonicalPath LIKE CONCAT('%',fi.CanonicalPath,'%' )
    --GROUP BY fi.CanonicalPath
)
,AlmostCorrect AS 
( 
    SELECT *
    FROM
    (
        SELECT mp.BestPath,fi.FirstId AS ID,FirstSource AS IDSource
        FROM FindIslands fi
        INNER JOIN MaxPaths mp On mp.SortIndex=1 AND fi.CanonicalPath=mp.CanonicalPath
        UNION ALL
        SELECT mp.BestPath,fi.SecondId,SecondSource
        FROM FindIslands fi
        INNER JOIN MaxPaths mp On mp.SortIndex=1 AND fi.CanonicalPath=mp.CanonicalPath
    ) t
    WHERE ID IS NOT NULL
    GROUP BY BestPath,ID,IDSource
)
SELECT * FROm AlmostCorrect;

The result

+--------------------------------+--------+----------+
| BestPath                       | ID     | IDSource |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | 11111  | ID1      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | AAAAA  | ID2      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | BBBBBB | ID2      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | BC1    | ID3      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BBBBBB|BC1|GGGGG| | GGGGG  | ID2      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|GGGGG|        | BC1    | ID3      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|GGGGG|        | GGGGG  | ID2      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|O111|         | BC1    | ID3      |
+--------------------------------+--------+----------+
| |11111|AAAAA|BC1|O111|         | O111   | ID4      |
+--------------------------------+--------+----------+
| |11111|AAAAA|O111|             | AAAAA  | ID2      |
+--------------------------------+--------+----------+
| |11111|AAAAA|O111|             | O111   | ID4      |
+--------------------------------+--------+----------+
| |99999|DDDDDD|                 | 99999  | ID1      |
+--------------------------------+--------+----------+
| |99999|DDDDDD|                 | DDDDDD | ID2      |
+--------------------------------+--------+----------+
| |CCCCCC|                       | CCCCCC | ID2      |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333|             | EEEEEE | ID2      |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333|             | O222   | ID4      |
+--------------------------------+--------+----------+
| |EEEEEE|O222|O333|             | O333   | ID4      |
+--------------------------------+--------+----------+

The idea behind:

You can see the result of each intermediate step simply by using SELECT * FROM [cte-name] as last select (out-comment the current last select).

The CTE "cte" will transform your side-by-side structure to a row-based set.
Following your statement, that you have a maximum of two different ID Numbers per record the second CTE "AllIDs" will transform this set to a set with two IDs keeping knowledge of where this ID was taken from.
Now we go into recursion. We start with all IDs, where the second ID is NULL (WARNING, You might not catch all, the recursion anchor might need some more thinking) and find any linked row (either by ID1 or by ID2). While traversing down we create a path of all visited IDs and we stop, if we re-visit one of them.
The cte "FindIslands" will transform this path to XML and use XQuery's FLWOR in order to return the path alphabetically sorted.
The cte "MaxPaths" will find the longest path of a group in order to find paths which are completely embedded within other paths.
The cte "AlmostCorrect" will now re-transform this to a row-based set and pick the rows with the longest path.

What we have achieved:

All your IDs show the same "IDSource" as your own example.
You can see, how the IDs are linked with each other.

What we did not yet achieve:

The paths |11111|AAAAA|BBBBBB|BC1|GGGGG|, |11111|AAAAA|BC1|GGGGG|, |11111|AAAAA|BC1|O111|, |11111|AAAAA|O111| are treated as different, although their fragments are overlapping.

At the moment I'm to tired to think about this... Might be a get an idea tomorrow ;-)

I wrote in a comment that this problem is similar to [How to find all connected subgraphs of an undirected graph](https://stackoverflow.com/questions/35254260/how-to-find-all-connected-subgraphs-of-an-undirected-graph). In fact, it is identical. I added it as an answer. I think you were overcomplicating things with looking for longest path. — Vladimir Baranov, Aug 09 '19 at 01:55

The Impaler · Answer 3 · 2019-08-03T17:48:15.447

I don't quite understand the structure of the expected result, but the key of your query is to assemble the nodes into subgraphs, while giving each subgraph an ID (you call it GroupId).

I leave the final rendering of the result to you since you probably understand in detail why you want to show it in that way. A few LEFT JOINs will do the trick.

Anyway, here's the query that produces the subgraphs:

with
p as (
  select
    row_id, row_id as min_id,
    cast(concat(':', row_id, ':') as varchar(1000)) as walked,
    case when id1 is null then ':' else cast(concat(':', id1, ':') as varchar(1000)) end as i1,
    case when id2 is null then ':' else cast(concat(':', id2, ':') as varchar(1000)) end as i2,
    case when id3 is null then ':' else cast(concat(':', id3, ':') as varchar(1000)) end as i3,
    case when id4 is null then ':' else cast(concat(':', id4, ':') as varchar(1000)) end as i4
  from t
  union all
  select
    t.row_id, case when t.row_id < p.min_id then t.row_id else p.min_id end,
    cast(concat(walked, t.row_id, ':') as varchar(1000)),
    case when t.id1 is null then p.i1 else cast(concat(p.i1, id1, ':') as varchar(1000)) end,
    case when t.id2 is null then p.i2 else cast(concat(p.i2, id2, ':') as varchar(1000)) end,
    case when t.id3 is null then p.i3 else cast(concat(p.i3, id3, ':') as varchar(1000)) end,
    case when t.id4 is null then p.i4 else cast(concat(p.i4, id4, ':') as varchar(1000)) end
  from p
  join t on p.i1 like concat('%:', t.id1, ':%')
         or p.i2 like concat('%:', t.id2, ':%')
         or p.i3 like concat('%:', t.id3, ':%')
         or p.i4 like concat('%:', t.id4, ':%')
  where p.walked not like concat('%:', t.row_id, ':%')
),
g as (
  select min_id as min_id, min(walked) as nodes
  from p
  where not exists (
    select 1
    from t 
    where (p.i1 like concat('%:', t.id1, ':%')
        or p.i2 like concat('%:', t.id2, ':%')
        or p.i3 like concat('%:', t.id3, ':%')
        or p.i4 like concat('%:', t.id4, ':%'))
       and p.walked not like concat('%:', t.row_id, ':%')
  )
  group by min_id
)
select row_number() over(order by min_id) as group_id, nodes from g

Result:

group_id  nodes          
--------  ---------------
1         :1:2:3:7:5:4:6:                                     
2         :8:            
3         :10:9:         
4         :11:12:13:

For reference, here's the data script I used to test:

create table t (
  row_id int,
  id1 int,
  id2 varchar(10),
  id3 varchar(10),
  id4 varchar(10)
);

insert into t (row_id, id1, id2, id3, id4) values 
  (1,  '11111', null,     null,  null),
  (2,  '11111', null,     null,  null),
  (3,  '11111', 'AAAAA',  null,  null),
  (4,  null,    'BBBBB',  'BC1', null),
  (5,  null,    null,     'BC1', '0111'),
  (6,  null,    'GGGGG',  'BC1', null),
  (7,  null,    'AAAAA',  null,  '0111'),
  (8,  null,    'CCCCCC', null,  null),
  (9,  '99999', null,     null,  null),
  (10, '99999', 'DDDDD',  null,  null),
  (11, null,    null,     null,  '0222'),
  (12, null,    'EEEEE',  null,  '0222'),
  (13, null,    'EEEEE',  null,  '0333');

Note: I can imagine the performance of this query being quite slow. A solution in PostgreSQL would be much performant since -- unlike SQL Server -- it implements UNION in recursive CTEs. This could remove entire tree branches much earlier in the graph walk compared to UNION ALL (the only choice in SQL Server).

score 0 · Answer 4 · answered Aug 14 '19 at 05:00

In Such question 2-3 different sample data help us understand the pattern of data.

It help in writing better query.

DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
    (
        RowId   INT,
        ID1 VARCHAR(100),
        ID2 VARCHAR(100),
        ID3 VARCHAR(100),
        ID4 VARCHAR(100)
    )

INSERT INTO #ID VALUES 
    (1,'11111',NULL,NULL,NULL),
    (2,'11111',NULL,NULL,NULL),
    (3,'11111','AAAAA',NULL,NULL),
    (4,NULL,'BBBBBB','BC1',NULL),
    (5,NULL,NULL,'BC1','O111'),
    (6,NULL,'GGGGG','BC1',NULL),
    (7,NULL,'AAAAA',NULL,'O111'),
    (8,NULL,'CCCCCC',NULL,NULL),
    (9,'99999',NULL,NULL,NULL),
    (10,'99999','DDDDDD',NULL,NULL),
    (11,NULL,NULL,NULL,'O222'),
    (12,NULL,'EEEEEE',NULL,'O222'),
    (13,NULL,'EEEEEE',NULL,'O333')

;With CTE as
(
select distinct  RowId, IDNumber,IDType 
--,ROW_NUMBER()over(order by rowid)rn

from
(select * from #ID)p
unpivot(IDNumber for  IDType in(ID1,ID2,ID3,ID4)) as unpvt
)
,CTE2 as
(
select  c.* 
,ROW_NUMBER()over(partition by rowid order by rowid desc)rn1
from CTE C
)
,CTE3 as
(
select *
,dense_rank()over( order by idnumber)rn3
--,1 rn3
from cte2 c
where rn1=1
and not exists(select 1 from cte2 c1 
where c1.RowId=c.RowId and c1.rn1>c.rn1)

)
,CTE4 as
(
select RowId,IDNumber,IDType,rn3 as Groupid 
,1 lvl
from cte3 c 
where rowid>1

union all

select c.RowId,c.IDNumber,c.IDType,c1.GroupID
,lvl+1

from CTE2 C
inner join CTE4 C1 on  (
(c.IDNumber=c1.IDNumber and c.RowId<>c1.RowId ) 
or (c.RowId=c1.RowId and c.IDNumber<>c1.IDNumber) 
)

where lvl<=8
)

select distinct IDNumber,IDType,Groupid 
--,RowId
from cte4
order by Groupid

IN CTE I have first UnPivoted the result.

In CTE2 & CTE3 together I am creating GroupID beforehand, according to what I have understood.

CTE4 is recursive .

This script can be optimized after checking 2-3 different sample data.

With CTE4 result it can be again PIVOTED to your DeNormalize form.

I think this is ideal situation to try Cursor with optimized script.

Matching records across multiple possible IDs

4 Answers4