3

Assume a time series where we have one value per point in time. I have to calculate a continues sum by calculating the current value + the sum of the previous value. Tricky part however is, that the sum should be capped and hence not exceed a certain value.

Example: Sum is capped at +2.

+-----+---------+------------+
| row | measure | capped sum |
+-----+---------+------------+
|   1 |       1 |          1 |
|   2 |       3 |          2 |
|   3 |       4 |          2 |
|   4 |      -2 |          0 |
|   5 |       1 |          1 |
+-----+---------+------------+

e.g. the "capped sum" of row 4 is calulated using the "capped sum" from the previous row, "2", and adding the current value. As the result is < 2 we can take the value as it is.

Problem is, with HANA, I cannot use a LAG/Window function on the "capped sum" field. It will give me a "unknown column" error.

Any ideas how to resolve this in SQL/HANA SQL Script without using for loops (which would be very slow)?

Cœur
  • 37,241
  • 25
  • 195
  • 267
newBee
  • 1,289
  • 1
  • 14
  • 31
  • Not sure this can be done without using SUM OVER and MAX OVER. If you try to self-join to the next row and do the calculations that way, you cannot assume SQL will traverse the rows sequentially, which you need for a running total. I have a solution that uses windowing functions (but not LEAD or LAG) if you are interested. – Robert Sievers Feb 01 '17 at 20:57
  • Always interested in possible solutions. Even if it does not work for me it wont hurt or maybe help others. Feel free to drop it as a fiddle or even as solution. – newBee Feb 01 '17 at 20:59
  • can you use a recursive cte in hana? – Vamsi Prabhala Feb 01 '17 at 21:25
  • Unfortunately not. – newBee Feb 01 '17 at 21:29

2 Answers2

1

This script first creates a column of a running sum. Then it uses that column to create a column of "overage", how much the running sum exceeds the capped value, cumulatively. Then it subtracts the overage in order to give a value less than 2 if appropriate.

DECLARE @capped_value INT = 2
;WITH CTE AS
(SELECT rowID,measure,
    running_total = SUM(measure) OVER 
    (ORDER BY rowID ROWS UNBOUNDED PRECEDING)
FROM dbo.test_capped_sum)
,
CTE2 AS
(SELECT *,
    overage_total = MAX(CTE.running_total) 
    OVER (ORDER BY rowID ROWS UNBOUNDED PRECEDING) - @capped_value
FROM CTE)

SELECT rowid,measure,
    CASE WHEN CTE2.overage_total > 0
    THEN CTE2.running_total- CTE2.overage_total 
    ELSE CTE2.running_total END
    AS capped_sum FROM CTE2
Robert Sievers
  • 1,277
  • 10
  • 15
  • Relies on CTE which I cannot use. However I guess this can be helpful for others who can use CTEs – newBee Feb 01 '17 at 21:33
  • 1
    good solution except it fails if the first row has a measure less than the capped value. you can add a special condition for that. – Vamsi Prabhala Feb 01 '17 at 21:43
  • @newBee: Unless CTEs are recursive they're just a syntax variation of nested Derived Table: `SELECT ... FROM ( SELECT ... FROM (SELECT ... ) AS CTE) AS CTE2` and Hana will support these. – dnoeth Feb 02 '17 at 08:44
1

This solution uses a recursive-cte which isn't supported in HANA (as per the OP). Posting a solution which works with databases that support it.

WITH ROWNUMS AS
 (SELECT T.*,
         ROW_NUMBER() OVER(ORDER BY ROW) AS RNUM
  FROM T) 
,RCTE AS
 (SELECT ROW,
         RNUM,
         MEASURE,
         MEASURE AS CAPPED_SUM
  FROM ROWNUMS
  WHERE RNUM=1
   UNION ALL
   SELECT RN.ROW,
          RN.RNUM,
          RN.MEASURE,
          CASE
              WHEN R.CAPPED_SUM+RN.MEASURE>=2 THEN 2
              ELSE R.CAPPED_SUM+RN.MEASURE
          END
   FROM ROWNUMS RN
   JOIN RCTE R ON R.RNUM=RN.RNUM-1 )
SELECT ROW,
       MEASURE,
       CAPPED_SUM
FROM RCTE

Sample Demo

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • Nice answer tho. However I can use for loops. Will adapt my solution accordingly. As this "algoromithm" cannot be parallized (a calculation always depends on the previous Iteration) it will eventually achieve similiar Performance. – newBee Feb 01 '17 at 22:01