How to calculate Running Multiplication

Question

I have two tables

WAC table

ID  wac_inc             item
--  -----------------   ----
1   2.310000000000000   A
2   1.100000000000000   A
3   2.130000000000000   A
4   1.340000000000000   A

Baseline Table

item    baseline
----    ------------------
A       10.000000000000000

Expected Result

ID  wac_inc             item    Running_Mul   
--  -----------------   ----    -----------
1   2.310000000000000   A       10.231     --  10 * (1+(2.310000000000000/100))
2   1.100000000000000   A       10.343541  --  10.231 * (1+(1.100000000000000/100))
3   2.130000000000000   A       10.563858  --  10.343541 * (1+(2.130000000000000/100))
4   1.340000000000000   A       10.705413  --  10.563858 * (1+(1.340000000000000/100))

Formula to find running_mul is

Baseline * (1 + (wac_inc/100))

SQLFIDDLE

here for every row previous row Running_Mul value is the baseline and for the first row baseline will be coming from baseline table.

Hope i made it clear. AFAIK we can do this using CURSOR but i want to avoid RBAR as much as possible. Can anyone suggest me the better way of doing it.

You could take the logarithm and compute a running sum since `exp(log(a) + log(b)) = a * b`. Not sure how much numeric precision this preserves. — usr, Apr 07 '15 at 09:41
@usr - sorry, can post the same as answer with little explanation. — Pரதீப், Apr 07 '15 at 10:42
@Fireblade, what kind of details would you like to see in answers? The problem of calculating running totals has been discussed many times, here is one good example: http://stackoverflow.com/questions/11310877/calculate-running-total-running-balance Every method shown there applies to calculating running multiplication either directly, or by converting `MUL` into `SUM` using `EXP(SUM(LOG(val)))`. — Vladimir Baranov, Apr 09 '15 at 22:43
@VladimirBaranov - Am looking for complete answer which will work in Sql server 2008 with index options. `LOG` function answers were failing when LOG function gets the value less than `one` — Pரதீப், Apr 12 '15 at 04:31
@Fireblade, please add the following details to your question to get a correct detailed answer: roughly how many rows are in `Baseline` table? how many rows are in `wac` table? how are they distributed for each item? is it allowed to create indexes? Result set should have all rows from `wac` or only rows for one given `item`? Example 1: `baseline` has 100 rows, `wac` has 100K rows, each item has ~1000 rows. Example 2: `baseline` has 10K rows, `wac` has 100K rows, most items have 4 rows, but 6 items have 10K rows. Result set should have all rows from `wac` plus running product per each item. — Vladimir Baranov, Apr 12 '15 at 11:04
@Fireblade, knowing the sizes of your tables it will be possible to write a script that generates test data and compare different solutions. — Vladimir Baranov, Apr 12 '15 at 11:10

Giorgi Nakeuri · Accepted Answer · 2015-04-12T07:54:47.277

Try:

DECLARE @t TABLE
    (
      ID INT ,
      wac DECIMAL(30, 10) ,
      item CHAR(1)
    )
DECLARE @b TABLE
    (
      item CHAR(1) ,
      baseline DECIMAL(30, 10)
    )

INSERT  INTO @t
VALUES  ( 1, 2.31, 'A' ),
        ( 2, 1.10, 'A' ),
        ( 3, 2.13, 'A' ),
        ( 4, 1.34, 'A' )


INSERT  INTO @b
VALUES  ( 'A', 10 );


WITH    ordercte
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( PARTITION BY item ORDER BY ID ) AS rn
               FROM     @t
             ),
        rec
          AS ( SELECT   t.item ,
                        t.ID ,
                        t.wac ,
                        t.rn ,
                        b.baseline * ( 1 + ( t.wac / 100 ) ) AS m
               FROM     ordercte t
                        JOIN @b b ON b.item = t.item
               WHERE    t.rn = 1
               UNION ALL
               SELECT   t.item ,
                        t.ID ,
                        t.wac ,
                        t.rn ,
                        c.m * ( 1 + ( t.wac / 100 ) )
               FROM     ordercte t
                        JOIN rec c ON t.item = c.item
                                      AND t.rn = c.rn + 1
             )
    SELECT  id ,
            wac ,
            item ,
            m
    FROM    rec

Output:

id  wac             item    m
1   2.3100000000    A       10.231000
2   1.1000000000    A       10.343541
3   2.1300000000    A       10.563858
4   1.3400000000    A       10.705414

EDIT1

I was trying to implement LOG EXP trick but could not manage unless @usr lead me to solution. So all credits to user @usr:

WITH    ordercte
          AS ( SELECT   t.ID ,
                        t.wac ,
                        t.item ,
                        b.baseline ,
                        ROW_NUMBER() OVER ( PARTITION BY t.item ORDER BY ID ) AS rn
               FROM     @t t
                        JOIN @b b ON b.item = t.item
             )
    SELECT  baseline
            * EXP(SUM(LOG(( 1 + ( wac / 100 ) ))) OVER ( PARTITION BY item ORDER BY rn )) AS m
    FROM    ordercte

Or just:

SELECT  t.ID, t.wac, t.item, baseline
        * EXP(SUM(LOG(( 1 + ( wac / 100 ) ))) OVER ( PARTITION BY t.item ORDER BY t.ID )) AS m
FROM    @t t
        JOIN @b b ON b.item = t.item

if ID is the field you order by.

Output:

ID  wac             item    m
1   2.3100000000    A       10.231
2   1.1000000000    A       10.343541
3   2.1300000000    A       10.5638584233
4   1.3400000000    A       10.7054141261722

EDIT2

For SQL 2008 use:

WITH    cte
          AS ( SELECT   t.ID ,
                        t.wac ,
                        t.item ,
                        baseline ,
                        ( SELECT    SUM(LOG(( 1 + ( wac / 100 ) )))
                          FROM      @t it
                          WHERE     it.item = t.item AND it.ID <= t.ID
                        ) AS e
               FROM     @t t
                        JOIN @b b ON b.item = t.item
             )
    SELECT  ID, wac, item, baseline * EXP(e) AS m
    FROM    cte

EDIT3

Here is complete solution for SQL Server 2008 with dialing with NULLs and negative values:

WITH    cte
          AS ( SELECT   t.ID ,
                        t.wac ,
                        t.item ,
                        b.baseline , 
                        ca.e,
                        ca.n,
                        ca.m
               FROM     @t t
               JOIN @b b ON b.item = t.item
               CROSS APPLY(SELECT   SUM(LOG(ABS(NULLIF( 1 +  wac / 100 , 0)))) as e,
                                    SUM(SIGN(CASE WHEN 1 +  wac / 100 < 0 THEN 1 ELSE 0 END)) AS n,
                                    MIN(ABS(1 +  wac / 100)) AS m
                          FROM      @t it
                          WHERE     it.item = t.item AND it.ID <= t.ID
                          ) ca
             )
    SELECT  ID, wac, item, baseline *
                        CASE
                            WHEN m = 0 THEN 0
                            WHEN n % 2 = 1 THEN -1 * EXP(e)
                            ELSE EXP(e) 
                        END as Result
    FROM    cte

Cursors are used when it is necessary to process the data in particular order row-by-row. Just as you need here. The set-based SQL in this case will be inefficient. Since SQL Server 2012 there are `LAG` and `LEAD` functions that help with these types of queries, but not in 2008. Of course, if your table is small, then set-based solution with `O(n*n)` complexity (self join in this answer) would run faster than `O(n)` solution with the cursor, so you need to check the performance with your real data. At the very least make sure that you have correct index. I think it should be on `(item, ID)`. — Vladimir Baranov, Apr 07 '15 at 11:40
@GiorgiNakeuri - It is **failing** when when `LOG` function get the value less than 1 — Pரதீப், Apr 12 '15 at 04:29

score 7 · Answer 2 · answered Apr 07 '15 at 16:46

7

You can transform a series of multiplications into a series of additions with the following math trick:

exp(log(a) + log(b)) = a * b

So MUL(a) is EXP(SUM(LOG(a))).

SELECT SUM(val) AS [Sum], EXP(SUM(LOG(val))) AS Product
FROM (VALUES 
    (1), (2), (3), (4)
) x(val)

This emits sum = 10, product = 24.

Potential problems are rounding errors and zero factors.

You can now use one of the usual ways to achieve a running aggregate such as windowing functions. That's a solved problem.

answered Apr 07 '15 at 16:46

usr

168,620
35
240
369

I was trying to implement this also but couldn't manage to do this. The main problem here is to get previous EDITED value and not the previous value. It is not just a running multiplication, it is running formula. I came to conclusion that there are only two way to achieve this: cursors and recursive cte. If this is incorrect conclusion I would be very happy to see EXP LOG trick implementation on this question. – Giorgi Nakeuri Apr 08 '15 at 12:40
1

@GiorgiNakeuri isn't `(1 + (wac_inc/100))` the factor that you want to multiply aggregate? That would be `EXP(SUM(LOG((1 + (wac_inc/100)))))`. – usr Apr 08 '15 at 12:51
@GiorgiNakeuri, I've added a complete working solution for SQL Server 2012 that uses this trick. The same trick can be used in all set-based variants for calculating running totals including non-recursive CTE, such as self-joins. – Vladimir Baranov Apr 08 '15 at 13:54

score 5 · Answer 3 · edited May 23 '17 at 10:28

For the sake of completeness here is a full solution for SQL Server 2012 that uses the EXP(SUM(LOG(val))) trick suggested by @usr in another answer.

WITH
CTE
AS
(
    SELECT
        0 AS ID
        ,item
        ,baseline AS wac_inc
        ,baseline AS m
    FROM baseline

    UNION ALL

    SELECT
        ID
        ,item
        ,wac_inc
        ,1 + wac_inc/100 AS m
    FROM wac
)
SELECT
    ID
    ,item
    ,wac_inc
    ,m
    ,EXP(SUM(LOG(m)) OVER (PARTITION BY item ORDER BY ID ROWS UNBOUNDED PRECEDING)) AS MulRows
FROM CTE;

result set

ID  item wac_inc            m                   MulRows
0   A   10.000000000000000  10.000000000000000  10
1   A   2.310000000000000   1.023100000000000   10.231
2   A   1.100000000000000   1.011000000000000   10.343541
3   A   2.130000000000000   1.021300000000000   10.5638584233
4   A   1.340000000000000   1.013400000000000   10.7054141261722

If SQL Server 2012 is available, this window SUM is very efficient. For previous versions any set-based solution would result in O(n*n) complexity, which means that cursor would be a better way. Here is a very good article by Aaron Bertrand comparing different methods of calculating running totals: http://sqlperformance.com/2012/07/t-sql-queries/running-totals Or SO question: Calculate running total / running balance

Of course, if your table is small, then set-based solution with O(n*n) complexity may run faster than O(n) solution with the cursor due to cursor overhead, so you need to check the performance with your real data.

@GiorgiNakeuri, there is no recursion here. I just UNIONed baseline with the main data to give it a starting point for the running total. Your second solution for SQL Server 2012 under EDIT1 where you `JOIN` baseline with the main data may be better than this `UNION`, because it should be able to use indexes more efficiently. My `UNION` variant may need extra sort. It is necessary to check the execution plan. — Vladimir Baranov, Apr 08 '15 at 23:01
Sorry, when I see union all I fought that was recursion and didnt notice that there was no join on cte. — Giorgi Nakeuri, Apr 09 '15 at 06:12

void · Answer 4 · 2015-04-09T10:46:34.097

3

you could do it easily with a recursive cte:

with rec(id ,wi,i,r) as 
(
  select top (1) w.ID,w.wac_inc,w.item, b.baseline * (1 + (w.wac_inc/100))
  from wac w join baseline b on w.item=b.item
  union all
  select w.ID,w.wac_inc,w.item, r.r * (1 + (w.wac_inc/100))
  from wac w
  join rec r on (w.ID)-1 = r.id
 )
 select * from rec

Output:

1   2.31    A   10.231
2   1.1     A   10.343541
3   2.13    A   10.563858
4   1.34    A   10.705414

check in the demo

EDIT - Adding another solution:

you can do it by taking help from a copy of your original table :

Assuming your schema and data is:

create  table wac
  (ID int,wac_inc numeric(38,15),item char )

insert wac
values (1,2.31,'A'),
(2,1.1,'A'),
(3,2.13,'A'),
(4,1.34,'A')

1.take a copy from original table(use a temp table or a table variable) and update the first record from baseline table:

create table  #tmp (ID int,wac_inc numeric(38,15),item char, Running_Mul numeric(38,15))
insert into #tmp select id,wac_inc,item,null from wac

update #tmp set Running_Mul = (select top 1 baseline from baseline)*(1+(wac_inc/100))
where id = (select min(id) from #tmp)

2.declare these variables:

declare @id int,@rm numeric(38,15)
select @id=min(id) from #tmp

select @rm=Running_Mul from #tmp where id=@id

3.update the copy:

update #tmp
set @rm=Running_Mul= case 
                    when @id <> id then @rm*(1+(wac_inc/100))
                    else Running_Mul
                 end,
@id=id

and now you can check the result:

select * from #tmp
drop table #tmp

Result:

ID  wac_inc             item    Running_Mul
1   2.310000000000000   A   10.231000000000000
2   1.100000000000000   A   10.343541000000000
3   2.130000000000000   A   10.563858000000000
4   1.340000000000000   A   10.705414000000000

edited Apr 09 '15 at 10:46

answered Apr 07 '15 at 14:10

void

7,760
3
25
43

"Easily"... (A valid answer, though.) – usr Apr 07 '15 at 16:43
@usr , wasn't it good to say that yes? (I mean the word I used). I wish I didn't use it. however I think it's not important and has not a negative meaning. – void Apr 07 '15 at 17:24
1

Your second solution is called "quirky update" by Aaron Bertrand http://sqlperformance.com/2012/07/t-sql-queries/running-totals He says: "This is not exactly guaranteed to work, and I would never recommend it for production code". Indeed there is no guarantee that this running total would be calculated in some particular order. – Vladimir Baranov Apr 09 '15 at 12:48
A trivial check, that should be easy for you, since you've got everything set up: try to create a clustered index on `ID DESC` on your `#tmp` table. Then try with a clustered index on, say, `wac_inc`. Would you still observe the same result? If results are different - it is a direct proof that the method is unreliable. If results are the same - it doesn't prove or disprove anything. There may be other reasons/ways when the server chooses a different order of updating rows. – Vladimir Baranov Apr 09 '15 at 12:53
@VladimirBaranov, first: the `#tmp` or a `variable table` is just a helper and it's not considered or needed to create an cluster index on it so please don't say _your solution will not work under undesired rules_, second: you can [see here](http://www.sqlservercentral.com/articles/T-SQL/68467/) for more info about that method and see how it works. however my main suggested solution was the first one and just to for saying a different solution I provided it. – void Apr 09 '15 at 13:12
I did read [that long article](http://www.sqlservercentral.com/articles/T-SQL/68467/). The author explicitly says that "clustered index must be present in the correct order" plus 9 more rules that you have to follow. Author admits: "just stop telling me that it's an undocumented hack... I already know that". It is undocumented, which means that the fact that the author has observed correct behaviour of this "quirky update" in his tests doesn't prove or guarantee that it will continue to work correctly in future versions of SQL Server or under some circumstances which author didn't think of. – Vladimir Baranov Apr 09 '15 at 22:34

How to calculate Running Multiplication

4 Answers4

Linked

Related