3

I have the following two table E and G.

create table E(K1 int, K2 int primary key (K1, K2))

insert E 
values (1, 11), (1, 20), (2, 10), (2, 30), (3, 10), (3, 30), 
    (4, 100), (5, 200), (6, 200),
    (7, 300), (8, 300), (9, 310), (10, 310), (10, 320), (11, 320), (12, 330)

create table G(GroupID varchar(10), K1 int primary key)

insert G 
values ('Group 1', 1), ('Group 1', 2), ('Group 2', 4), ('Group 2', 5),
       ('Group 3', 8), ('Group 3', 9), ('Group 3', 12)

I need to a view - giving a K2 number, find all related K1. The "related K1" is defined:

  1. All K1s have the same K2 in table E. For example, 2 and 3 in E are related because both records have K2 of 10. ((2, 10), (3, 10)).

  2. All K1s have the same GroupID in table G. For example, the K1 of 1 and 2 are both in group Group 1.

So querying the following view

select K1 from GroupByK2 where K2 = 200 -- or 100

should return

4
5
6

because both (5, 200) and (6, 200) have the same K2. And the 4 and 5 of (4, 100) and (5, 200) are both in 'Group 2'.

And select K1 from GroupByK2 where K2 = 300 -- or 310, 320, 330 should return 7, 8, 9, 10, 11, 12.

View:

create view GroupByK2
as
with cte as (
    select E.*, K2 K2x from E
    union all
    select E.K1, E.K2, cte.K2x
    from cte 
        join G on cte.K1 = G.K1
        join G h on h.GroupID = G.GroupID
        join E on E.K1 = h.K1 and E.K1 <> cte.K1
    where not exists (select * from cte x where x.k1 = G.k1 and x.K2 = G.K2) -- error
)
select *
from cte;

However, the SQL has the error of

Recursive member of a common table expression 'cte' has multiple recursive references?

ca9163d9
  • 27,283
  • 64
  • 210
  • 413

1 Answers1

1

Scratched my head over this one a bit, but here is a working, although highly inefficient solution...

You correctly tried to eliminate joining the original rows back to avoid the cyclic recursion, but it won't work due to 2 reasons:

  1. As the error stated, you can't reference the recursive member more than once
  2. Even if you could, at each recursion, the recursive set consists only of the output of the previous recursion, so you wouldn't be able to eliminate the cycles from earlier recursions anyway.

My solution avoids that in a "less than optimal" way, it simply includes all the rows with the cycles, but limits the recursion level to a hard number (5 in the example, but you can parameterize it as well) to avoid the endless recursion, and only at the final query, eliminates the duplicates with a group by.

This may or not work for you depending on the depth of the hierarchy. It creates tons of redundant work, and I doubt it will scale, but YMMV. I addressed it as a logical puzzle :-)

This is one of the (rare) cases where I will definitely consider an iterative solution instead of a set based one. You will need to create a table valued function so you can parameterize it, which you won't be able to do properly with a view. Within the function create a temporary table or table variable, populate it with the output sets one by one, and loop until you are done. This way you will be able to eliminate the cycles at the root by checking the content of the temporary table and only inserting new rows.

Anyway, here goes:

;WITH KeyGroups AS 
(
SELECT E.*, G.GroupID
FROM   E
       LEFT OUTER JOIN 
       G 
       ON E.K1 = G.K1
),
Recursive AS
(
SELECT K.K1, K.K2, K.GroupID, 0 AS lvl 
FROM   KeyGroups AS K
WHERE  K.K2 = 300
UNION  ALL
SELECT K.K1, K.K2, K.GroupID, lvl + 1
FROM   Recursive AS R
       INNER JOIN 
       KeyGroups AS K
       ON R.GroupID = K.GroupID
          OR
          R.K2 = K.K2
          OR 
          R.K1 = K.K1
WHERE  lvl < 5
)
SELECT MIN(lvl) AS lvl, K1, K2, GroupID 
FROM   Recursive
GROUP BY GroupID, K1, K2
ORDER BY lvl, K1, K2, GroupID;

Also see DBFiddle.

I'll give this some more thought tomorrow if I have time, and update here if I find a better solution.

Thanks for the interesting challenge and well formulated post.

HTH

SQLRaptor
  • 671
  • 4
  • 14
  • It runs slowly with real data. BTW, since there is `lvl < 5` in the query. Maybe just creating five subqueries can be faster. – ca9163d9 Aug 07 '19 at 16:36
  • you can change lvl to any number to match the max depth of your data. Even better if you parametrize it in the function, so you can dynamically assign a max level. I assumed it would run slowly on large data sets, which may or may not be an issue. as I said = YMMV... and I would definitely compare alternatives such as an iterative approach with a loop. From your proficiency level, it seems this is something you can handle, but if you want help, let us know. The iterative solution logic is much simpler. – SQLRaptor Aug 07 '19 at 17:13