2

I have a large amount of data. I need to implement a product aggregation on each value. Let me explain with example to make it clear.

This is a sample data-

/*SampleTable*/
|ID|Date  |Value  |
| 1|201401|25     |
| 1|201402|-30    |
| 1|201403|-15    |
| 1|201404|50     |
| 1|201405|70     |

| 2|201010|1.15   |
| 2|201011|1.79   |
| 2|201012|0.82   |
| 2|201101|1.8    |
| 2|201102|1.67   |

Have to make this table-

/*ResultTable*/
|ID|Date  |Aggregated Value  |
| 1|201312|100               |
| 1|201401|125               |
| 1|201402|87.5              |
| 1|201403|74.375            |
| 1|201404|111.563           |
| 1|201405|189.657           |

| 2|201009|100               |
| 2|201010|101.15            |
| 2|201011|102.960           |
| 2|201012|103.804           |
| 2|201101|105.673           |
| 2|201102|107.438           |
-- Note: The 100 values are separately inserted for each ID at the month before first date
-- of previous table

Here for each ID, I have a Value (Column 2) given with corresponding Date (YYYYMM format). I have to implement the following formula to calculate the Aggregated Value column Grouped By each ID -

current_Aggregated_Value = previous_aggregated_value * ((current_value/100) + 1))

There was no easy solution for this. I have to take aggregated value of previous row, which is also a generated value by the same query (except 100, it has been manually added), to calculate aggregated value for current row. As it is not possible to take a generated value in runtime for SQL, I had to implement a product aggregate function described here.

so 2nd aggregated_value (125) was derived by (100 * ((25 / 100) + 1)) = 125
3rd aggregated_value (87.5) was derived by (125 * ((-30 / 100) + 1)) = 87.5
But as we cannot take the generated '125' value in runtime, I had to take the product aggregate of the all previous value, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) = 87.5
similarly 4th value (74.375) comes from, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) * ((-15 / 100) + 1) = 74.375

Giving a sample query below -

INSERT INTO ResultTable (ID, [Date], [Aggregate Value])
SELECT temps.ID, temps.[Date],
    CASE
       WHEN temps.min_val = 0 THEN 0
       WHEN temps.is_negative % 2 = 1 THEN -1 * EXP(temps.abs_multiplier) * 100
       ELSE EXP(temps.abs_multiplier) * 100
    END AS value

FROM
(
   SELECT st1.ID, st1.[Date],
       -- Multiplication by taking all +ve values
       SUM(LOG(ABS(NULLIF(((st2.Value / 100) + 1), 0)))) AS abs_multiplier,
       -- Count of -ve values, final result is -ve if count is odd
       SUM(SIGN(CASE WHEN ((st2.Value / 100) + 1) < 0 THEN 1 ELSE 0 END)) AS is_negative,
       -- If any value in the multipliers is 0 the whole multiplication result will be 0
       MIN(ABS((st2.Value / 100) + 1)) AS min_val
   FROM SampleTable AS st1
   INNER JOIN SampleTable AS st2 ON (st2.ID = st1.ID AND st2.[Date] <= st1.[Date])
   GROUP BY st1.id, st1.[Date]
) AS temps;

Basically, it is taking the product aggregate for all aggreagted values of previous dates for each value to calculate the desired value. Well, it is as messy as it sounds and looks and "h-word" slow! But I couldn't find any better solution for this kind of problem in SQL Server 2008 R2 (unless u can give me one).

So, I wanna know 2 things-
1. Is it possible to do it without joining the same table like I did there?
2. Is there any better way to do product aggregation on SQL Server 2008 R2? (I know there is one way in Server 2012, but that is not an option for me)

Sorry for the L-O-N-G question! But Thanks in advance!

Community
  • 1
  • 1
NBM21
  • 47
  • 1
  • 8
  • Seems that your join is returning much more rows than expected (check `AND st2.[Date] <= st1.[Date]` condition). You should get always one row per `ID`, right? Have you checked Execution plan what is the slowest part? – Jan Zahradník May 25 '15 at 11:46
  • Actually to calculate one row, I need the aggregated product of all previous values. Do I had to use `st2.[Date] <= st1.[Date]` part. Let me explain,
    for 2nd value (125) the calculation was 100*((25/100)+1)
    for 3rd value (87.5) the calculation was 125*((-30/100)+1). It is not possible to take 125 in runtime. So it had to be done like 100*((25/100)+1) * ((-30/100)+1)
    for 4th value (74.375) it is 100*((25/100)+1) * ((-30/100)+1) * ((-15/100)+1)
    and so on... @Jan Zahradník
    – NBM21 May 25 '15 at 11:56
  • From the description it seems that 3rd row is calculated based only on the 2nd, not 1st and 2nd together. Also the results says that you use only the value from previous month. – Jan Zahradník May 25 '15 at 11:59
  • You are right, the value comes from the generated value of the previous month. But as it is done in single execution of the query, It is not possible to get that previous value during runtime. @Jan Zahradník – NBM21 May 25 '15 at 12:04
  • Check [Common table expressions](https://technet.microsoft.com/en-us/library/ms186243%28v=sql.105%29.aspx), its very useful for recursive queries – Jan Zahradník May 25 '15 at 12:11
  • Maybe I could not explain the scenario properly. Let me edit my question a little to describe a little more. @JanZahradník – NBM21 May 25 '15 at 12:12
  • 1
    In SQL Server 2012+, you can use the cumulative sum function. However, in SQL Server 2008, I think any approach (without a cursor) will have similar performance to what you are doing now. – Gordon Linoff May 25 '15 at 12:30
  • It would be a relief if this is one of the best possible methods for doing this kind of work on 2008 R2 server. Thanks for the comment. @GordonLinoff – NBM21 May 25 '15 at 12:36
  • 1
    There are two ways, recursion which is simple and slow, or tricks with `LOG` and `EXP` which is not easy and fast comparing to recursion. – Giorgi Nakeuri May 25 '15 at 12:39
  • So is it safe to say that this method is faster than recursive implementation? @GiorgiNakeuri – NBM21 May 25 '15 at 12:50
  • @NBM21, it should but you can check. – Giorgi Nakeuri May 25 '15 at 12:51

1 Answers1

1

I've run several reports that make heavy use of recursion and the results have usually been very acceptable and not slow at all. Give this solution a shot:

-- http://stackoverflow.com/questions/30437219/aggregated-product-generation-on-runtime-for-sql-server-2008-r2

-- Create temp table to hold sample data
Create table #sampleTable
(   ID int
,   YrMnth date not null
,   CurrentValue numeric(13,3)
);

-- Insert sample data into the temp table
-- Date values have an added '01' at the end to make them compatible with the "date" datatype
insert into #sampleTable
values  (1,'20131201',100)
    ,   (1,'20140101',25)
    ,   (1,'20140201',-30)
    ,   (1,'20140301',-15)
    ,   (1,'20140401',50)
    ,   (1,'20140501',70)
    ,   (2,'20100901',100)
    ,   (2,'20101001',1.15)
    ,   (2,'20101101',1.79)
    ,   (2,'20101201',0.82)
    ,   (2,'20110101',1.8)
    ,   (2,'20110201',1.67);

-- Declare recursive CTE which loads the first values for each ID as the anchor
With CTE 
as 
(
Select      0 as lvl
        ,   minID.ID
        ,   minID.YrMnth
        ,   s.CurrentValue
From    #sampleTable s
        inner join (select  ID
                        ,   min(YrMnth) as 'YrMnth'
                    from    #sampleTable
                    group by ID) as minID
            on s.ID = minID.ID
            and s.YrMnth = minID.YrMnth

union all

-- Add the recursive part which unions on the same ID and +1 month for the date
-- Note that the cast in the calculation is required to prevent datatype errors between anchor and recursive member
select      cte.lvl + 1 as lvl
        ,   CTE.ID
        ,   S2.YrMnth
        ,   cast(CTE.CurrentValue * ((s2.CurrentValue / 100) + 1) as numeric(13,3))
        --, s2.CurrentValue
from    #sampleTable s2
        inner join CTE
            on s2.ID = CTE.ID
            and S2.YrMnth = dateadd(month,1,cte.YrMnth)

)

-- Select final result set
Select      *
from        CTE
order by    ID
        ,   YrMnth
        ,   lvl;

-- Clean up temp table
drop table #sampleTable;

I had to add a day part to your date values so I could treat them as a date datatype. This allows you to join the recursive member on "month + 1". The "lvl" column was added by me just to check the recursion results but I left it in as it's useful to see how many recursions a particular record has gone through. It will depend on your total data size how fast this will run, but I'm pretty sure it will be faster than your original solution. Note that this solution assumes your dates are sequential for a given ID with no missing months.