1

I have a dataset as follows:

;WITH CTE AS
( SELECT *
  FROM (VALUES
(1, 10, 20, 30)
(2, 10, 21, 31)
(3, 11, 21, 31)
(4, 12, 22, 32)
(5, 13, 23, 33)
(6, 14, 24, 33)
(7, 14, 25, 34)
(8, 15, 26, 36)
) AS MyValues(ID, GroupID1, GroupID2, GroupID3)
)
SELECT *
FROM CTE

How can I generate the following collapsing the individual groups into a single group?

ID SingleGroupID
1 1
2 1
3 1
4 2
5 3
6 3
7 3
8 4
GMB
  • 216,147
  • 25
  • 84
  • 135
Danny Rancher
  • 1,923
  • 3
  • 24
  • 43
  • 1
    Your question doesn't really explain what your new SingleGroup is supposed to combine. – Marc Guillot May 17 '23 at 13:38
  • 1
    Perhaps also explain the relevance of the columns and what determins the singlegroupId? Or are we to guess? – Stu May 17 '23 at 13:39
  • 1
    i think he wants to match together various IDs by the groupIDs. so, as long as one of the groupids matches any other row's groupIDs, these two should be combined together. finally, when all the dust settles, every row should be part of some global group set. Unfortunately, the question has been asked and answered many times already, so @OP, you might wanna search for previous questions regarding to grouping rows by some ID – siggemannen May 17 '23 at 14:30
  • 1
    This is a graph-walking problem, where OP wants to group together `id`s that have at least one group in common. Voting to reopen the question, since (1) it is hard to solve, even when looking at other existing answers (and the question was not closed as a duplicate but as unclear) (2) it is interesting! – GMB May 17 '23 at 14:35
  • 1
    Does this answer your question? [Creating a single ID for any combination of matches between three different identifiers](https://stackoverflow.com/questions/75559657/creating-a-single-id-for-any-combination-of-matches-between-three-different-iden) – siggemannen May 17 '23 at 14:48
  • @GMB, i thought it looked familiar, found something very similar :D – siggemannen May 17 '23 at 14:49

1 Answers1

5

This is a typical graph-walking problem, where you want to build islands of ids that have at least one group in common.

We can start by unpivoting the groups to build a list of nodes, and generate edges with a self-join (edges connect ids that share the same group). We can then recursively traverse the edges, while keeping track of the path we followed until there. The last step is to aggregate by id.

So:

with 
    nodes as (
        select t.id, v.grp
        from mytable t
        cross apply ( values (t.GroupID1), (t.GroupID2), (t.GroupID3) ) v(grp)
    ),
    edges as (
        select distinct n1.id as id1, n2.id as id2
        from nodes n1
        inner join nodes n2 on n1.grp = n2.grp
    ),
    rec as (
        select id1, id2, cast(id1 as nvarchar(max)) as visited from edges
        union all
        select r.id1, e.id2, concat(r.visited, ',', e.id2)
        from rec r
        inner join edges e on e.id1 = r.id2
        where concat(',', r.visited, ',') not like concat('%,', e.id2, ',%')
    ),
    fin as (
        select id1, min(value) min_id
        from rec r
        cross apply string_split(r.visited, ',')
        group by id1
    )
select id1 as id, dense_rank() over(order by min_id) grp
from fin f
id grp
1 1
2 1
3 1
4 2
5 3
6 3
7 3
8 4

fiddle

GMB
  • 216,147
  • 25
  • 84
  • 135
  • wow, thank you. I don't think I quite appreciated the complexity of the required solution. I am struggling to run this query on larger datasets possibly due to the where clause. I understand that the where clause is used to prevent cycles or loops and it ensure that the recursion stops when the node has already been visited before. Do you think it is possible to reduce the where clause to operate only on IDs instead? or is this where clause always required as-is? Thank you. – Danny Rancher May 18 '23 at 16:27
  • 1
    @DannyRancher: yes the query will be quite expensive if there are lots of paths to follow. Not sure what you mean by *operate on IDs instead*, though. Basically, we need to explore all possible paths to be able to identify the clusters in the graph (ie the new groups). – GMB May 18 '23 at 21:35