5

I have the following table:

CREATE TABLE tbl_proc(
    [proc] float,
    subscriber bigint   
)

data:

proc | subscriber
-----|-----------
0.7  | 123456   
0.5  | 1234567  
0.3  | 12345    
0.3  | 45678    
0.3  | 1234 
0.2  | 123455   
0.1  | 894562   

I would like to find a nice method to add a new column to the table that represents the sum of the above values.

Result :

proc | subscriber | col3
-----|------------|------------
0.7  | 123456     | 0.7
0.5  | 1234567    | 1.2 -- 0.7 + proc
0.3  | 12345      | 1.5
...

I found the following method:

Select a.[proc],SUM(b.[proc])
from tbl_proc a, tbl_proc b
where a.[proc] <= b.[proc] and (a.[proc] <> b.[proc] or a.subscriber >= b.subscriber)
group by a.[proc],a.subscriber
order by a.[proc] desc

In my table the data is sorted desc by proc. Also the subscriber column is unique.

This method I found is a little bit too expensive ( my tables are large ). Due to performance reasons I did not considered th cursor - like solution.

Any suggestions?


Update:

I googled the problem a little bit more and I found the "Update to a local variable" solution on this page:

http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx

As far as I tested this proves to be the best solution so far.

declare @runningTotal float = 0

UPDATE tbl_proc SET @RunningTotal = new_col = @RunningTotal + [proc] FROM tbl_proc

Corovei Andrei
  • 1,646
  • 6
  • 28
  • 42
  • Maybe you should *try* the cursor solution and compare the results, you might be surprised. Also, have you searched for solutions to your problem? I Googled "sql running total" and found a load of answers. – Tony Sep 06 '11 at 14:28
  • 1
    @Martin I corrected my mistake. Sry – Corovei Andrei Sep 06 '11 at 14:28
  • This kind of query will much *much*, much easier in Denali due to [enhancements to `OVER` clause](http://www.geniiius.com/blog/t-sql-enhancements-over-clause/), but agree with others that you need a further attribute to determine order. – onedaywhen Sep 06 '11 at 14:59
  • 1
    @Corovei - The method you have added to your question is the quirky update technique I refer to in my answer. Please read that link - at least the bottom section about "the rules" as there are quite a few (known) things that can go wrong. e.g. you should take a table lock and prevent parallelism. – Martin Smith Sep 06 '11 at 15:44
  • 1
    @Martin thank you very much for your help. I will consider your answer the right one because it led me to the solution. I will keep an eye however on the results of this method and if they prove to be wrong I will change the procedure with the cursor method. Thanx again :D – Corovei Andrei Sep 06 '11 at 16:11

3 Answers3

9

This is generally known as calculating running totals.

There is a very fast method to do what you want to do called "quirky update" but it relies on undocumented behaviour.

Other than that cursors are the fastest way for large sets as the workload for these grows linearly whereas your triangular join workload grows exponentially (until next version and the improved OVER clause).

See this document by Itzik Ben Gan for more about the issue.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
2

A cursor may be a good option for you. They're not always bad -- in fact, with large datasets in certain situations (like yours, I think) they can out-perform set-based operations.

I looked into this a while back -- some good answers & comments people posted that might be relavent to what you're working on: When are TSQL Cursors the best or only option?

Community
  • 1
  • 1
Chains
  • 12,541
  • 8
  • 45
  • 62
1

also note - not good practice. this is certainly not normalized.
The order by is ambiguous.

i would avoid this if i were you. instead write a view or query that returns this dynamically.

Randy
  • 16,480
  • 1
  • 37
  • 55
  • +1 for the `ORDER BY` point. Less convinced by the other point though. This is quite an expensive thing to keep recalculating. – Martin Smith Sep 06 '11 at 14:45