41

I trying to perform following calculation

Sample data:

CREATE TABLE #Table1
  (
     rno   int identity(1,1),
     ccp   varchar(50),
     [col1] INT,
     [col2] INT,
     [col3] INT,
     col4 as [col2]/100.0
  );

INSERT INTO #Table1
            (ccp,[col1],[col2],[col3])
VALUES      ('ccp1',15,10,1100),
            ('ccp1',20,10,1210),
            ('ccp1',30,10,1331),
            ('ccp2',10,15,900),
            ('ccp2',15,15,1000),
            ('ccp2',20,15,1010)

+-----+------+------+------+------+----------+
| rno | ccp  | col1 | col2 | col3 |   col4   |
+-----+------+------+------+------+----------+
|   1 | ccp1 |   15 |   10 | 1100 | 0.100000 |
|   2 | ccp1 |   20 |   10 | 1210 | 0.100000 |
|   3 | ccp1 |   30 |   10 | 1331 | 0.100000 |
|   4 | ccp2 |   10 |   15 |  900 | 0.150000 |
|   5 | ccp2 |   15 |   15 | 1000 | 0.150000 |
|   6 | ccp2 |   20 |   15 | 1010 | 0.150000 |
+-----+------+------+------+------+----------+

Note : It is not just 3 records each ccp can have N no.of records

Expected Result :

1083.500000 --1100 - (15 * (1+0.100000))
1169.850000 --1210 - ((20 * (1+0.100000)) + (15 * (1+0.100000)* (1+0.100000)) )
1253.835000 --1331 - ((30 * (1+0.100000)) + (20 * (1+0.100000)* (1+0.100000)) + (15 * (1+0.100000)* (1+0.100000) *(1+0.100000)) )
888.500000  --900 - (10 * (1+0.150000))
969.525000  --1000 - ((15 * (1+0.150000)) + (10 * (1+0.150000)* (1+0.150000)) )
951.953750  --1010 - ((20 * (1+0.150000)) + (15 * (1+0.150000)* (1+0.150000)) + (10 * (1+0.150000)* (1+0.150000) *(1+0.150000)) )

I know we can do this using Recursive CTE, it is not efficient since i have to do this for more than 5 million records.

I am looking to implement something like this set based approach

For ccp : ccp1

SELECT col3 - ( col1 * ( 1 + col4 ) )
FROM   #Table1
WHERE  rno = 1

SELECT rno,
       col3 - ( ( col1 * Power(( 1 + col4 ), 1) ) + ( Lag(col1, 1)
                                                        OVER(
                                                          ORDER BY rno ) * Power(( 1 + col4 ), 2) ) )
FROM   #Table1
WHERE  rno IN ( 1, 2 )

SELECT rno,
       col3 - ( ( col1 * Power(( 1 + col4 ), 1) ) + ( Lag(col1, 1)
                                                        OVER(
                                                          ORDER BY rno ) * Power(( 1 + col4 ), 2) ) + ( Lag(col1, 2)
                                                                                                          OVER(
                                                                                                            ORDER BY rno ) * Power(( 1 + col4 ), 3) ) )
FROM   #Table1
WHERE  rno IN ( 1, 2, 3 ) 

Is there a way to calculate in single query?

Update :

Still am open to suggestions. I strongly beleive there should be some to do this using SUM () Over(Order by) window aggregate function.

Pரதீப்
  • 91,748
  • 19
  • 131
  • 172

6 Answers6

30

Finally I achieved the result using below approach

SELECT a.*,
       col3 - res AS Result
FROM   #TABLE1 a
       CROSS apply (SELECT Sum(b.col1 * Power(( 1 + b.COL2 / 100.00 ), new_rn)) AS res
                    FROM   (SELECT Row_number()
                                     OVER(
                                       partition BY ccp
                                       ORDER BY rno DESC) new_rn,*
                            FROM   #TABLE1 b
                            WHERE  a.ccp = b.ccp
                                   AND a.rno >= b.rno)b) cs

Result :

+-----+------+------+------+------+----------+-------------+
| rno | ccp  | col1 | col2 | col3 |   col4   |   Result    |
+-----+------+------+------+------+----------+-------------+
|   1 | ccp1 |   15 |   10 | 1100 | 0.100000 | 1083.500000 |
|   2 | ccp1 |   20 |   10 | 1210 | 0.100000 | 1169.850000 |
|   3 | ccp1 |   30 |   10 | 1331 | 0.100000 | 1253.835000 |
|   4 | ccp2 |   10 |   15 |  900 | 0.150000 | 888.500000  |
|   5 | ccp2 |   15 |   15 | 1000 | 0.150000 | 969.525000  |
|   6 | ccp2 |   20 |   15 | 1010 | 0.150000 | 951.953750  |
+-----+------+------+------+------+----------+-------------+
Pரதீப்
  • 91,748
  • 19
  • 131
  • 172
  • 7
    Well done! Can't remember last time I seen cool problem as this one on SO – Veljko89 Jan 23 '17 at 12:09
  • @Pரதீப் Impressive. Did you rewrite this to use window functions (ORDER BY UNBOUNDED PRECEDING)? Would it improve the readability? This is actually pretty compact and reads easy. The performance (15 to 17 sec you mentioned above) is from this query? – suresubs Dec 24 '17 at 16:05
  • 1
    @suresubs - Its impossible to write this query using aggregate window functions. Trust me I have done enough research – Pரதீப் Dec 24 '17 at 16:08
  • NB: You're using `b.COL2 / 100.00` instead of `COL4`. If you have the option to change `COL4` to `PERSISTED` you may get another slight boost. – JohnLBevan Dec 07 '18 at 16:55
10

This answer may be disappointing but you'll likely find that an iterative CLR approach performs competitively with any TSQL approach.

Try the following (based on Running sums yet again: SQLCLR saves the day!)

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void StackoverflowQuestion41803909()
    {
        using (SqlConnection conn = new SqlConnection("context connection=true;"))
        {
            SqlCommand comm = new SqlCommand();
            comm.Connection = conn;
            comm.CommandText = @"
SELECT [rno],
       [ccp],
       [col1],
       [col2],
       [col3],
       [col4]
FROM   Table1
ORDER  BY ccp,
          rno 
";

            SqlMetaData[] columns = new SqlMetaData[7];
            columns[0] = new SqlMetaData("rno", SqlDbType.Int);
            columns[1] = new SqlMetaData("ccp", SqlDbType.VarChar, 50);
            columns[2] = new SqlMetaData("col1", SqlDbType.Int);
            columns[3] = new SqlMetaData("col2", SqlDbType.Int);
            columns[4] = new SqlMetaData("col3", SqlDbType.Int);
            columns[5] = new SqlMetaData("col4", SqlDbType.Decimal, 17, 6);
            columns[6] = new SqlMetaData("result", SqlDbType.Decimal, 17, 6);

            SqlDataRecord record = new SqlDataRecord(columns);

            SqlContext.Pipe.SendResultsStart(record);

            conn.Open();

            SqlDataReader reader = comm.ExecuteReader();

            string prevCcp = null;
            decimal offset = 0;

            while (reader.Read())
            {
                string ccp = (string)reader[1];
                int col1 = (int)reader[2];
                int col3 = (int)reader[4];
                decimal col4 = (decimal)reader[5];

                if (prevCcp != ccp)
                {
                    offset = 0;
                }

                offset = ((col1 + offset) * (1 + col4));
                record.SetInt32(0, (int)reader[0]);
                record.SetString(1, ccp);
                record.SetInt32(2, col1);
                record.SetInt32(3, (int)reader[3]);
                record.SetInt32(4, col3);
                record.SetDecimal(5, col4);
                record.SetDecimal(6, col3 - offset);

                SqlContext.Pipe.SendResultsRow(record);

                prevCcp = ccp;
            }

            SqlContext.Pipe.SendResultsEnd();
        }
    }
};
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
5

Another option

CREATE TABLE #Table1
  (
     rno   int identity(1,1),
     ccp   varchar(50),
     [col1] INT,
     [col2] INT,
     [col3] INT,
     col4 as [col2]/100.0
  );

INSERT INTO #Table1
            (ccp,[col1],[col2],[col3])
VALUES      ('ccp1',15,10,1100),
            ('ccp1',20,10,1210),
            ('ccp1',30,10,1331),
            ('ccp1',40,10,1331),
            ('ccp2',10,15,900),
            ('ccp2',15,15,1000),
            ('ccp2',20,15,1010);

select t.*, col3-s
from(
    select *, rn = row_number() over(partition by ccp order by rno)
    from #Table1
) t
cross apply (
    select s=sum(pwr*col1)
    from(
        select top(rn)
           col1, pwr = power(1+col4, rn + 1 - row_number() over(order by rno))
        from #Table1 t2
        where t2.ccp=t.ccp
        order by row_number() over(order by rno)
        )t3
    )t4
order by rno;
Serg
  • 22,285
  • 5
  • 21
  • 48
4

An approach with a self join. Not sure if this would be any more efficient than your version with cross apply.

WITH T AS
  (SELECT *,
          ROW_NUMBER() OVER(PARTITION BY CCP
                            ORDER BY RNO) AS RN
   FROM #TABLE1)
SELECT T1.RNO,
       T1.CCP,
       T1.COL1,
       T1.COL2,
       T1.COL3,
       T1.COL3-SUM(T2.COL1*POWER(1+T1.COL2/100.0,T1.RN-T2.RN+1)) AS RES
FROM T T1
JOIN T T2 ON T1.CCP=T2.CCP
AND T1.RN>=T2.RN
GROUP BY T1.RNO,
         T1.CCP,
         T1.COL1,
         T1.COL2,
         T1.COL3

Sample Demo

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • 1
    Tested it with 3 million records(`50000 ccp's` and `60 rno` for each ccp). My query took **15-17** seconds. Your's took **60-65** seconds. – Pரதீப் Jan 24 '17 at 12:33
3

Try this:

;with 
    val as (
        select 
            *, 
            (1 + col2 / 100.00) val,
            row_number() over(partition by ccp order by rno desc) rn
        from #Table1),
res as (
        select 
            v1.rno, 
            --min(v1.ccp) ccp,
            --min(v1.col1) col1, 
            --min(v1.col2) col2, 
            min(v1.col3) col3, 
            sum(v2.col1 * power(v2.val, 1 + v2.rn - v1.rn)) sum_val
        from val v1
        left join val v2 on v2.ccp = v1.ccp and v2.rno <= v1.rno
        group by v1.rno)
select *, col3 - isnull(sum_val, 0)
from res

But performance depends on indexes. Post index structure for details. Best performance can be achieved when you will split it into more temporary tables.

Deadsheep39
  • 561
  • 3
  • 16
2

After playing with it for some time I believe the answer to the bounty question of whether or not this can be done with a sum() over (order by) is NO. This code is as close as I could get:

select  *, col3 - sum(col1 * power(1 + col4, row_num)) over (partition by ccp order by col1)
from    (
        select  *, row_number() over (partition by ccp order by rno asc) row_num
        from    @Table1
        ) a
order   by 1,2;

This will return correct results for the first row in each ccp group. By calculating row_num using rno desc instead then the final row in each ccp will be correct.

It appears that the only ways to get this to work in the simple way that the syntax suggests would be:

  1. Syntax support to reference the actual row within the aggregate over function. This does exist in T-SQL as far as I can find.
  2. Syntax support for a window function within a window function. This also is not permitted in T-SQL per the following error:

Windowed functions cannot be used in the context of another windowed function or aggregate.

This was an interesting problem. I'd be curious how this solution performs against your large dataset even though the actual result is incorrect.

SMM
  • 2,225
  • 1
  • 19
  • 30