OK, you asked for an answer without loops (Cursors, I assume), and while the cursor based solution was pretty efficient (a single forward only pass through) and relatively easy to understand, a set based solution is possible so I'll present it here. If you get a chance to compare them I'm curious how different the performance is, but regardless here is your set based answer. Note that it uses no cursors, no temp tables, and no recursive CTEs, so it should be very fast.
EDIT: The original version produced too small a tally table. Normally in pure SQL I'd use ROW_NUMBER on a big system table or view like sys.objects
, or maybe specially build one (many articles on the internet about building tally tables) but I'm unfamiliar with Snowflake and wasn't sure if you'd have access to system objects, so I built a bigger one by cross joining cteExp
(the expanded names of all RightName
instances). I haven't worked out why my original math was wrong or under what conditions this might possibly still be too small (I doubt it though, this will draw from a list bigger than the number of distinct RNs squared) but here is a corrected version. I don't have time to debug this further today, though perhaps this weekend I'll revisit and try to provide commentary.
WITH cteSample as (
SELECT * FROM (VALUES
(CONVERT(nvarchar(20), '10'),CONVERT(nvarchar(50),'R1')
,CONVERT(datetime2(0),'2023-01-01 00:00:00.000')
,CONVERT(datetime2(0),'2023-10-31 00:00:00.000')),
('10','R1','2024-04-01 00:00:00.000','2026-12-31 00:00:00.000'),
('10','R1','2022-04-01 00:00:00.000','2025-12-31 00:00:00.000'),--Used for edge testing
('10','R1','2021-01-01 00:00:00.000','2025-01-31 00:00:00.000'),--Used for edge testing
('10','R1','2020-06-01 00:00:00.000','2024-05-01 00:00:00.000'),--Used for edge testing
('10','R1','2020-02-01 00:00:00.000','2021-02-28 00:00:00.000'),--Used for edge testing
('10','R1','2023-01-01 00:00:00.000','2024-12-31 00:00:00.000'),--Used for edge testing
('10','R2','2023-02-01 00:00:00.000','2027-12-31 00:00:00.000'),
('10','R2','2020-01-01 00:00:00.000','2022-12-31 00:00:00.000'),--Used for edge testing
('10','R2','2021-01-01 00:00:00.000','2022-12-31 00:00:00.000'),--Used for edge testing
('10','R3','2023-02-01 00:00:00.000','2024-07-31 00:00:00.000'),
('10','R3','2024-10-01 00:00:00.000','2026-12-31 00:00:00.000'),
--('10','R3','2024-12-01 00:00:00.000','2025-12-01 00:00:00.000'), --Used for edge testing
('10','R4','2022-01-01 00:00:00.000','2026-12-31 00:00:00.000')
--, ('10','R4','2023-01-01 00:00:00.000','2025-12-31 00:00:00.000')--Used for edge testing
) as SampleTable(ASSETCODE, RIGHTNAME, ACQUISITIONBEGINDATE, ACQUISITIONENDDATE)
), cteIDs as (
SELECT *
, RIGHTNAME as RNBase
--We can use IDOverAll if JOIN on RIGHTNAME is a bottleneck, any ID will work
, ROW_NUMBER() OVER (ORDER BY RIGHTNAME, ACQUISITIONBEGINDATE) -1 as IDOverAll
--For IDInGroup it MUST be consecutive whole numbers!
, ROW_NUMBER() OVER (PARTITION BY RIGHTNAME ORDER BY ACQUISITIONBEGINDATE) -1 as IDInGroup
FROM cteSample as S
), cteBaseRNs as (
SELECT RNBase, COUNT(*) as GroupSize
--For IDOfGroup we can use any ID
, ROW_NUMBER() OVER (ORDER BY RNBase) as IDOfGroup
FROM cteIDs as S
GROUP BY RNBase
),cteWeighted as (
SELECT *
--exp(sum(log(column))) effectively makes product
--see https://stackoverflow.com/questions/3653586/sql-server-query-groupwise-multiplication
, sum(log(GroupSize)) OVER (ORDER BY IDOfGroup ROWS UNBOUNDED PRECEDING) as PreWeightLog
, sum(log(GroupSize)) OVER (ORDER BY IDOfGroup DESC ROWS UNBOUNDED PRECEDING) as PostWeightLog
, log(GroupSize) as ThisWeightLog
FROM cteBaseRNs
), cteExp as (--Expanded RightNames to include _#
SELECT I.*, S.GroupSize, S.IDOfGroup
--If there is only 1 RN acquisition range, leave it bare
, CASE WHEN S.GroupSize = 1 THEN I.RNBase
ELSE CONCAT (I.RNBase, '_', CONVERT(nvarchar(5), IDInGroup))
END as RNExt --But if more then 1 append _0, _1, ...
FROM cteBaseRNs as S INNER JOIN cteIDs as I on I.RNBase = S.RNBase
), cteGroups as (
SELECT W.RNBase, W.GroupSize, W.IDOfGroup
, EXP(PreWeightLog - ThisWeightLog) as PreWeight --How many combinations before
--How many combinations in total
, EXP(PreWeightLog + PostWeightLog - ThisWeightLog) as TotalWeight
FROM cteWeighted as W
),ctePreTally as ( --We need a set of consecutive whole numbers the same size as the output
--In pure SQL I'd use a view like sys.objects to generate it, but I don't know what's
--available in snowflake, so I'll overbuild from CROSS JOINS on tables we have
SELECT ROW_NUMBER() OVER (ORDER BY E.IDOverAll) - 1 as Tally
, G.TotalWeight as NumberOfCombos
, (SELECT COUNT(*) FROM cteGroups) as NumberOfGroups
FROM cteExp as E CROSS JOIN cteExp as E2 CROSS JOIN cteGroups as G
),cteTally as ( --We need a set of consecutive whole numbers the same size as the output
SELECT Tally, NumberOfCombos, NumberOfGroups
FROM ctePreTally
WHERE Tally < NumberOfCombos * NumberOfGroups
), ctePreOut as ( --Build the output list with each distinct RN for each of the number of combinations
SELECT G.*, T.Tally + 1 as OutGroup --For user display
, CONVERT(bigint,FLOOR(T.Tally/PreWeight)) % GroupSize as IDInGroup
FROM cteTally as T CROSS JOIN cteGroups as G
WHERE T.Tally < NumberOfCombos
) SELECT --O.*, '---', G.* --Now build the final output and put it in the right order,
--adding back in the details we didn't carry like Acqutsition dates
G.AssetCode, G.RightName, G.RNExt, G.ACQUISITIONBEGINDATE, G.ACQUISITIONENDDATE, O.OutGroup
FROM ctePreOut as O
INNER JOIN cteExp as G
on O.RNBase = G.RNBase AND O.IDInGroup = G.IDInGroup
ORDER BY O.OutGroup, O.IDOfGroup
This produces the following output
AssetCode |
RightName |
RNExt |
ACQUISITIONBEGINDATE |
ACQUISITIONENDDATE |
OutGroup |
10 |
R1 |
R1_0 |
2023-01-01 00:00:00 |
2023-10-31 00:00:00 |
1 |
10 |
R2 |
R2 |
2023-02-01 00:00:00 |
2027-12-31 00:00:00 |
1 |
10 |
R3 |
R3_0 |
2023-02-01 00:00:00 |
2024-07-31 00:00:00 |
1 |
10 |
R4 |
R4 |
2022-01-01 00:00:00 |
2026-12-31 00:00:00 |
1 |
10 |
R1 |
R1_1 |
2024-04-01 00:00:00 |
2026-12-31 00:00:00 |
2 |
10 |
R2 |
R2 |
2023-02-01 00:00:00 |
2027-12-31 00:00:00 |
2 |
10 |
R3 |
R3_0 |
2023-02-01 00:00:00 |
2024-07-31 00:00:00 |
2 |
10 |
R4 |
R4 |
2022-01-01 00:00:00 |
2026-12-31 00:00:00 |
2 |
10 |
R1 |
R1_0 |
2023-01-01 00:00:00 |
2023-10-31 00:00:00 |
3 |
10 |
R2 |
R2 |
2023-02-01 00:00:00 |
2027-12-31 00:00:00 |
3 |
10 |
R3 |
R3_1 |
2024-10-01 00:00:00 |
2026-12-31 00:00:00 |
3 |
10 |
R4 |
R4 |
2022-01-01 00:00:00 |
2026-12-31 00:00:00 |
3 |
10 |
R1 |
R1_1 |
2024-04-01 00:00:00 |
2026-12-31 00:00:00 |
4 |
10 |
R2 |
R2 |
2023-02-01 00:00:00 |
2027-12-31 00:00:00 |
4 |
10 |
R3 |
R3_1 |
2024-10-01 00:00:00 |
2026-12-31 00:00:00 |
4 |
10 |
R4 |
R4 |
2022-01-01 00:00:00 |
2026-12-31 00:00:00 |
4 |