0

In SQL Server 2012 my (simplified) table looks like this:

Key  SubKey Quantity
--------------------
96614   1   0.604800
96615   1   1.920000
96615   2   3.840000
96616   1   1.407600
96617   1   0.453600
96617   2   3.568320
96617   3   2.710260
96618   1   11.520000
96619   1   0.453600
96620   1   7.919100
96620   2   4.082400
96626   1   14.394000
96627   1   9.525600
96627   2   4.762800
96627   3   4.536000
96628   1   2.268000

My query needs to identify consecutive keys (the SubKeys are basically irrelevant) and group them into ranges, summing the Quantities appropriately. So the expected output from the above would be:

KeyRange    TotalQuantity
-------------------------
96614-96620 38.47968
96626-96628 35.48640

I've tried to follow a number of examples that use window functions but I think because they are catering for different purposes they didn't make a lot of sense to me. Is this the right approach to take?

Wayne Ivory
  • 321
  • 2
  • 8
  • The answers from both MonkeyPushButton and scrawny meet the requirement in the question and under performance testing against 30,000 records both took less than a second. I've chosen the Answer from @MonkeyPushButton because the code is tighter and I particularly like the way he makes use of the fact that the number of RangeEnds will be the same as the number of RangeStarts and they can therefore be joined on a generated ID. – Wayne Ivory Mar 27 '19 at 04:49

3 Answers3

1

I don't think you can use built ins directly, although they are part of my solution. Code below basically detects range starts and ends (there isn't an entry in the table with a key value of one less / greater respectively) and uses those to group the data joining to it with a between clause.

WITH RangeStarts AS (
  SELECT 
    ROW_NUMBER () OVER (ORDER BY [Key] ASC) RangeId, 
    [Key] RangeStart
  FROM (SELECT DISTINCT [Key] FROM ConsKeyAsTable t) t
  WHERE NOT Exists (
    SELECT * FROM ConsKeyAsTable t2 WHERE t2.[Key] = t.[Key] - 1
  )
)
,RangeEnds AS (
  SELECT
    ROW_NUMBER () OVER (ORDER BY [Key] ASC) RangeId, 
    [Key] RangeEnd
  FROM (SELECT DISTINCT [Key] FROM ConsKeyAsTable t) t
  WHERE NOT Exists (
    SELECT * FROM ConsKeyAsTable t2 WHERE t2.[Key] = t.[Key] + 1
  )
)
SELECT 
  Cast(s.RangeStart as varchar(10)) + '-' + Cast(e.RangeEnd as varchar(10)) as KeyRange,
  SUM(t.Quantity) as Quantity
FROM RangeStarts s
  INNER JOIN RangeEnds e ON s.RangeId = e.RangeId
  INNER JOIN ConsKeyAsTable t ON t.[Key] BETWEEN s.RangeStart AND e.RangeEnd
GROUP BY
  s.RangeStart,
  e.RangeEnd

Sql fiddle http://sqlfiddle.com/#!18/080fa/31

Setup code

CREATE TABLE ConsKeyAsTable ([Key] int NOT NULL, [SubKey] int NOT NULL, Quantity float, Constraint PK PRIMARY KEY CLUSTERED ([Key], [SubKey]))

INSERT ConsKeyAsTable VALUES 
(96614,   1,   0.604800),
(96615,   1,   1.920000),
(96615,   2,   3.840000),
(96616,   1,   1.407600),
(96617,   1,   0.453600),
(96617,   2,   3.568320),
(96617,   3,   2.710260),
(96618,   1,   11.520000),
(96619,   1,   0.453600),
(96620,   1,   7.919100),
(96620,   2,   4.082400),
(96626,   1,   14.394000),
(96627,   1,   9.525600),
(96627,   2,   4.762800),
(96627,   3,   4.536000),
(96628,   1,   2.268000)
MonkeyPushButton
  • 1,077
  • 10
  • 19
  • Thanks, nice work @MonkeyPushButton. I can honestly say I thought of a similar-yet-different approach this morning before seeing your answer, but I haven't tried it out yet. I've up-voted you but I'll try mine out before deciding whether to award you the tick. – Wayne Ivory Mar 20 '19 at 00:56
1

Using a combination of window functions and sequential number tallies with recursive CTE's, the following should work (and will also take care of singular ranges in amongst the sample; see the setup SQL statement below):

DECLARE @start INT = (SELECT MIN(pKey) FROM @t);
DECLARE @end INT = (SELECT MAX(pKey) FROM @t);

WITH cte_RangeTally AS (
    SELECT @start num
    UNION ALL
    SELECT num + 1 FROM cte_RangeTally WHERE num+1 <= @end),
cte_Group AS (
    SELECT 
        T.pKey,
        ROW_NUMBER() OVER (ORDER BY RT.num) - ROW_NUMBER() OVER (ORDER BY T.pKey) grp
    FROM
        cte_RangeTally RT
    LEFT JOIN 
        (SELECT pKey 
        FROM @t 
        GROUP BY pKey) T ON RT.num = T.pKey),
cte_NumRanges AS (
    SELECT
        pKey,
        FIRST_VALUE(pKey) OVER(PARTITION BY grp 
                               ORDER BY pKey
                               ROWS BETWEEN UNBOUNDED PRECEDING 
                               AND CURRENT ROW) AS FirstNum,
        LAST_VALUE(pKey) OVER(PARTITION BY grp 
                               ORDER BY pKey
                               ROWS BETWEEN UNBOUNDED PRECEDING 
                               AND UNBOUNDED FOLLOWING) AS LastNum
    FROM
        cte_Group
    WHERE 
        cte_Group.pKey IS NOT NULL)
SELECT 
    CAST(NR.FirstNum AS VARCHAR(10)) + ' - ' + CAST(NR.LastNum AS VARCHAR(10)),
    SUM(T1.Quantity) AS TotalQty
FROM
    cte_NumRanges NR
RIGHT JOIN
    @t T1 ON T1.pKey = NR.pKey
GROUP BY 
    NR.FirstNum, 
    NR.LastNum;

Assuming the following setup code:

DECLARE @t TABLE (pKey INT, SubKey INT, Quantity FLOAT);

INSERT @t VALUES 
(96614,   1,   0.604800),
(96615,   1,   1.920000),
(96615,   2,   3.840000),
(96616,   1,   1.407600),
(96617,   1,   0.453600),
(96617,   2,   3.568320),
(96617,   3,   2.710260),
(96618,   1,   11.520000),
(96619,   1,   0.453600),
(96620,   1,   7.919100),
(96620,   2,   4.082400),
(96626,   1,   14.394000),
(96627,   1,   9.525600),
(96627,   2,   4.762800),
(96627,   3,   4.536000),
(96628,   1,   2.268000),
(96630,   1,   2.165000),
(96632,   1,   2.800000),
(96633,   1,   2.900000);
scrawny
  • 141
  • 1
  • 7
  • Another good answer - nice to see how the window functions are used as part of the solution. – Wayne Ivory Mar 27 '19 at 03:08
  • For performance testing I attempted to run the routine against a table of 250,000 records. The first problem I encountered was that the default maximum recursion for a CTE is 100 levels and the code as presented consumes one level for each record in the table. And when I tried to set the MAXRECURSION option to 250000 SQL Server reported that the maximum allowable is 32767. (In practice these numbers are excessive but it is worth highlighting the limitation in the answer.) – Wayne Ivory Mar 27 '19 at 04:43
0

(Edit: As pointed out by @scrawny this solution doesn't currently support singular ranges.)

The idea I'd had independently of the answer @MonkeyPushButton posted didn't pan out - I was trying to use LAG and LEAD and a few other techniques but couldn't get it operational. However in the process I had another idea which I've posted here. I'm not convinced it's "better" than Monkey's but thought others may be interested. (I did totally plagiarise his setup code which I hope is okay.)

SQL Fiddle http://sqlfiddle.com/#!18/8e86a/3

CREATE TABLE MyTable ([Key] int NOT NULL, [SubKey] int NOT NULL, Quantity float, Constraint PK PRIMARY KEY CLUSTERED ([Key], [SubKey]))

INSERT MyTable VALUES 
(96614,   1,   0.604800),
(96615,   1,   1.920000),
(96615,   2,   3.840000),
(96616,   1,   1.407600),
(96617,   1,   0.453600),
(96617,   2,   3.568320),
(96617,   3,   2.710260),
(96618,   1,   11.520000),
(96619,   1,   0.453600),
(96620,   1,   7.919100),
(96620,   2,   4.082400),
(96626,   1,   14.394000),
(96627,   1,   9.525600),
(96627,   2,   4.762800),
(96627,   3,   4.536000),
(96628,   1,   2.268000)

Four invocations of the table are used to create the set of key ranges. t1 and t4 create the StartKey and t2 and t3 create the EndKey.

WITH cte_KeyRange AS
     (
            SELECT [Key] AS StartKey,
                    (
                          SELECT MIN([Key])
                          FROM MyTable t2
                          WHERE t2.[Key] > t1.[Key]
                                 AND NOT EXISTS
                                 (
                                        SELECT [Key]
                                        FROM MyTable t3
                                        WHERE t3.[Key] = t2.[Key] + 1
                                 )
                   ) AS EndKey
            FROM MyTable t1
            WHERE NOT EXISTS
                   (
                          SELECT [Key]
                          FROM MyTable t4
                          WHERE t4.[Key] = t1.[Key] - 1
                   )
     )
SELECT CAST(StartKey AS varchar(10)) + '-' + CAST(EndKey AS varchar(10)) AS KeyRange, SUM(Quantity) AS TotalQuantity
FROM cte_KeyRange INNER JOIN MyTable ON [Key] BETWEEN StartKey AND EndKey
GROUP BY StartKey, EndKey
Wayne Ivory
  • 321
  • 2
  • 8
  • This might not work for singular ranges. For instance, using the example above, after 96628 comes 96630 then 96632, 96633 etc. – scrawny Mar 21 '19 at 04:36
  • You're correct @scrawny. I'll edit my answer to alert others of the deficiency for now and possibly come back to it later to try and program that bug out. (Or somebody else is welcome to try.) – Wayne Ivory Mar 27 '19 at 02:55