0

So I know the question does not make sense in the typical parent-child relationship, but in this case the parent and children are similar to google email groups. Each group (parent) can contain multiple other groups (children), and those children can also include other parents that also are linked to one of the ancestors. This is not a problem as the goal is to just determine which users belongs to a group, the recursion is not relevant nor wrong.

The basic data is available at SqlFiddle

CREATE TABLE SQLTest (
    Parent NVARCHAR(100) NULL
    , Child  NVARCHAR(100) NULL
)

INSERT INTO SQLTest
VALUES 
('A','B'),
('B','C'),
('C','A'),
('A','D'),
('X','Y')

with the query

with x (Parent,Child) as (
    select Parent, Child
    from SQLTest
    where Parent = 'A'
    union all
    select T.Parent,T.Child
    from SQLTest T
    join x on x.Child = T.Parent
)
select *
from x;

I was using PostgreSQL where this works as previous found records are not used for the next iteration of the join. However in SQL Server I am getting an infinite loop. I know how to stop the loop, and I also know I can make a materialized path to check for cycles, for reference see Microsoft Forums. However I am hoping there is something easier especially as I am using byte[] ulids for ids and don't really want to convert them and join and then do like. Hoping there is something easier and more performant that I am missing.

The expected output would be

A
B
C
D

Meaning just all the ids of all nodes, both the parents and children, that can be reached from A.

I am using the 2022 SQL Server version, fiddle is from an earlier version in case that matters.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Aktaeon
  • 189
  • 2
  • 14
  • I don't believe there is any other solution when using a recursive CTE. You can write a while loop, but that IMO is a worse solution. – Dale K Aug 20 '23 at 02:51
  • 1
    You may find something of value in the two techniques shown in [this](https://stackoverflow.com/a/30586065/92546) answer. – HABO Aug 20 '23 at 03:28
  • Hi Dale, do you think the proposed solution based on HABO answer is a worse solution? In what regard ? Ah I see you mean really a loop, still curious about your thoughts regarding the alternative. – Aktaeon Aug 20 '23 at 14:58

2 Answers2

2

Including path into the recursion allows for breaking-out of a loop because we can test if a value already exists in the pathway prior to the current row. e.g:

WITH x(Parent, Child, Path) AS (
    SELECT
         Parent
        , Child
        , CAST(Parent + ',' + Child AS NVARCHAR(MAX))
    FROM SQLTest
    WHERE Parent = 'A'
    
    UNION ALL
    
    SELECT
          T.Parent
        , T.Child
        , CAST(x.Path + ',' + T.Child AS NVARCHAR(MAX))
    FROM SQLTest T
    JOIN x ON x.Child = T.Parent
    WHERE CHARINDEX(T.Child, x.Path) = 0
    )
SELECT parent as node FROM x
UNION
SELECT child as node FROM x  
node
A
B
C
D

dbfiddle.uk (sql server 2022)

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • 1
    OP specifically said they didn't want to use this approach. – Dale K Aug 20 '23 at 05:56
  • " I am hoping there is something easier" Preferred isn't forbidden & I don't know how to get easier than this. No matter how you approach it you must find a way to break out from the loop (and relying on maxrecusion isn't a good approach at all) – Paul Maxwell Aug 20 '23 at 06:02
  • You're not wrong, but OP is already familiar with that technique, and there are already a bunch of duplicates out there for it. In fact Habo's link is more relevant because it shows a loop based approach, which is an alternative to the recursive CTE approach. – Dale K Aug 20 '23 at 06:04
  • Hi Paul, thank you for your answer and yes this is how I would solve as well, worried about the path though due to each T.Child mentioned above being an ULID/GUID and for that reason can likely also contain the separator ',' and in rare cases perhaps match wrongly. I know I can then do case statements and make it more improbable etc but still not entirely a fan since I possibly can go many levels deep and then for each item will need to do a CHARINDEX on a let's say 10 levels * 16 byte = 160 byte string. However, your solution might still be more performant in the end. – Aktaeon Aug 20 '23 at 14:15
1

Yes the solution of Paul Maxwell is what I had in mind when writing the question of how I would solve in case there are no other solutions. However, the response of HABO put me into a different way of doing things as requested.

I still will need to benchmark, but this possibly is more performant as I can perhaps add indices, but I will need to investigate.

For future reference my solution, based on an earlier answer of HABO mentioned under my question, that generates the requested result is

declare @Start nvarchar(100) = 'A';
declare @Relations Table ( Parent nvarchar(100) not null, 
                     Child nvarchar(100) null, index cluster_1 (Parent,Child));
insert into @Relations
  select Parent, Child
    from SQLTest
    where Parent = @Start or Child = @Start;
while @@RowCount > 0
  insert into @Relations
    select RT.Parent, RT.Child
      from @Relations as R inner join
        SqlTest as RT on RT.Child = R.Child or RT.Parent = R.Parent or
          RT.Child = R.Parent or RT.Parent = R.Child
    except
    select Parent, Child
      from @Relations;
select Parent as Id from @Relations
              union select Child as Id from @Relations;
Aktaeon
  • 189
  • 2
  • 14