I have a table in SQL Server with two columns, containing integers, for example:
1 2
2 7
5 7
7 10
10 11
12 13
13 14
I need to group them based on common integers. In the example
1, 2
and 2, 7
get into the first group because they share common integer 2
. 5, 7
get into this first group as well as they share 7
. So does 7, 10
follow into this first group. And 10, 11
follow into first group as well.
But 12
and 13
have no common members with the first group, so they create their own group, where 13, 14
are added.
So in the output we have
1 1
1 2
1 5
1 7
1 10
1 11
2 12
2 13
2 14
I hope the principle is clear. The naming of the groups is irrelevant.
Table is filtered so that left integer is less than the right one and every row is unique.
To achieve this goal I've written a code in T-SQL using recursive query ([dbo].[CDI]
is the source table) :
CREATE TABLE [dbo].[CDI_proxy]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NOT NULL,
primary key ([ID1], [ID2])
) ON [PRIMARY]
CREATE TABLE [dbo].[CDL]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NOT NULL,
[cnt] [int] NOT NULL default(0),
primary key ([ID1], [ID2])
) ON [PRIMARY]
create nonclustered index IX_1
on [dbo].[CDL] ([cnt]) include ([ID1], [ID2])
CREATE TABLE [dbo].[CDR]
(
[ID1] [bigint] NOT NULL,
[ID2] [bigint] NULL
) ON [PRIMARY]
insert into [dbo].[CDI_proxy]
select
d1.ID1,
d2.ID2
from
[dbo].[CDI] d1
union
select
d2.ID2,
d2.ID1
from
[dbo].[CDI] d2;
WITH cte([ID1], [ID2], LVL) AS
(
--Anchor Member
(SELECT
d1.ID1,
d1.ID2,
0 as LVL
FROM
[dbo].[CDI_proxy] d1
)
UNION ALL
--Recursive Member
SELECT
r.[ID1],
cte.[ID2],
LVL + 1 AS LVL
FROM
[dbo].[CDI_proxy] r
INNER JOIN
cte ON r.ID2 = cte.ID1
)
INSERT INTO [dbo].[CDL]([ID1], [ID2])
SELECT DISTINCT
cte1.[ID1],
cte1.[ID2]
FROM
cte cte1
OPTION (MAXRECURSION 0)
UPDATE [dbo].[CDL]
SET [cnt] = ag.cnt
FROM
(SELECT cdl.ID1, COUNT(cdl.ID2) AS cnt
FROM [dbo].CDL cdl
GROUP BY cdl.ID1) ag
WHERE ag.ID1 = [CDL].ID1
INSERT INTO [dbo].[CDR] ([ID1], [ID2])
SELECT
[ID1], [ID2]
FROM
(SELECT
cdl.*,
ROW_NUMBER() OVER (PARTITION BY cdl.ID2
ORDER BY cdl.cnt DESC, cdl.ID1 DESC) rnk
FROM
[dbo].[CDL] cdl) cdl
WHERE
rnk = 1
I run this script on approx. 5 million rows and it takes 3 hours to run without any ending (I turn it out). If I change part of the script like this
--Recursive Member
SELECT r.[ID1],
cte.[ID2],
LVL + 1 AS LVL
FROM [dbo].[CDI_proxy] r
inner join
cte ON r.ID2=cte.ID1
where LVL > 5
)
INSERT INTO ...
then it runs for 3 minutes and when afterwards I look at the result of the query
select id1, count(*) cnt
from dbo.CDR
group by id1
having count(*) > 5
order by cnt desc
then the highest group has only 8 members.
I suspect that my query goes into infinite recursion when LVL is less than 5. Is it possible and if yes then how?
Or did I make a mistake in my code?