27

I can't understand this code's bug

ID      AccountID       Quantity
1          1               10           Sum = 10
2          1               5                = 10 + 5 = 15
3          1               2                = 10 + 5 + 2 = 17
4          2               7                = 7
5          2               3                = 7 + 3 = 10  

SELECT ID, AccountID, Quantity, 
       SUM(Quantity) OVER (PARTITION BY AccountID ) AS TopBorcT, 
FROM tCariH
Willi Mentzel
  • 27,862
  • 20
  • 113
  • 121
serkan
  • 385
  • 1
  • 4
  • 4

3 Answers3

38

Seems like you expected the query to return running totals, but it must have given you the same values for both partitions of AccountID.

To obtain running totals with SUM() OVER (), you need to add an ORDER BY sub-clause after PARTITION BY …, like this:

SUM(Quantity) OVER (PARTITION BY AccountID ORDER BY ID)

But remember, not all database systems support ORDER BY in the OVER clause of a window aggregate function. (For instance, SQL Server didn't support it until the latest version, SQL Server 2012.)

Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • 11
    To clarify this: a sum() without an `order by` will simply sum all values for the group defined by the partition. Btw: SQL Server was/is the only (AFAIK) DBMS supporting windowing functions without supporting an order by in the partition clause. The others (PostgreSQL, Oracle, DB2, Teradata) do not have that limitation –  Apr 06 '12 at 08:29
  • 1
    I think you must be right about the systems with limited support of windowing aggregate functions. I did suspect that but wasn't quite sure. Having a look at another (related) question of the OP's, I can now see that SQL Server *is* the OP's database system particularly in this case, and a pre-2012 one too. – Andriy M Apr 06 '12 at 08:37
  • 1
    From what I can tell, `ORDER BY` was added in Sql 2005: http://msdn.microsoft.com/en-us/library/ms189461(v=sql.100).aspx – Robert Jeppesen Apr 26 '13 at 14:21
  • 2
    @RobertJeppesen: Yes, but only for *ranking* window functions. Support for `ORDER BY` in *aggregate* window functions was only added in SQL Server 2012. (Take a look at the [SQL Server 2005 version](http://msdn.microsoft.com/en-us/library/ms189461%28v=sql.90%29.aspx) of that manual, it's a bit clearer there.) – Andriy M Apr 26 '13 at 15:35
  • I'm unsure about the whack nature of those other DBMS at the time, but UV below has it right for SQL Server 2012. Without the "Rows Between" clause, the order by would do nothing. In this case. The Order by would still be useful if you have at least one ranking function in the analytical formula. – Gerard ONeill Mar 28 '23 at 03:46
  • @GerardONeill: Including `ROWS BETWEEN` may be beneficial for a number of reasons but I'm not sure what you mean by saying that without it `ORDER BY` would do nothing. On its own, `SUM(...) OVER (ORDER BY...)` is a [perfectly legitimate way](https://dbfiddle.uk/sizouvlp) of implementing a running total in SQL Server, particularly when the `ORDER BY` criteria define the sorting order precisely (i.e. without ties). Perhaps you could elaborate on your point or illustrate it with an example. – Andriy M Mar 29 '23 at 21:43
  • @AndriyM Hmm. Its more nuanced than I thought - if the values in the order by colum have the same value, then it is the non-cumulative sum (the total) of the column specified. If the values in the order by are different, then it shows the proper accumulation. http://sqlfiddle.com/#!18/5571b/1 So depending on the order by values, YMMV. I think its wrong. But I retract that it does nothing, and instead say it does nothing if the order by values are the same. – Gerard ONeill Mar 30 '23 at 16:42
  • 1
    @GerardONeill: It still doesn't do nothing, though. When the `ORDER BY` is non-deterministic like that, what would you expect as the output? When two values have the same ranking, the server can't decide which should be added up first or second etc. so it adds up all of them *at that point*. AFAIK this is according to the standard, not just how SQL Server decided to implement it. You will find that [other](https://dbfiddle.uk/heC1r9h9) [products](https://dbfiddle.uk/BYNvipoS) [have](https://dbfiddle.uk/TIY8vkfT) [implemented](https://dbfiddle.uk/c5cj061L) it in exactly the same way. – Andriy M Mar 31 '23 at 09:36
  • 1
    @GerardONeill: Throwing in `ROWS BETWEEN` may make it *look* better (which is subjective) but you have no guarantee that one value will *always* go first and the other second. If you want a deterministic behaviour, you just need a deterministic `ORDER BY`. I'm not bashing `ROWS BETWEEN` though, that's a different matter altogether. I'm only arguing that, as far as accuracy of the results is concerned, `ORDER BY` without `ROWS BETWEEN` in a window aggregate function works exactly as it should, SQL Server or not. – Andriy M Mar 31 '23 at 09:37
  • @AndriyM I have no problem with your arguments - it's probably best to always have a deterministic order by. However putting a sum instead of the actual accumulation up to that point is a fail IMHO. The SQL engine already created an arbitrary order to the rows. In addition, visually you'd expect the amount to be 0 if you saw two accumulated values in a row that are the same. Meh - since there is a way for my need (using rows_between), or for the total for all rows (order by 1), nothing more to say. – Gerard ONeill Mar 31 '23 at 15:34
14

if you are using SQL 2012 you should try

SELECT  ID, 
        AccountID, 
        Quantity, 
        SUM(Quantity) OVER (PARTITION BY AccountID ORDER BY AccountID rows between unbounded preceding and current row ) AS TopBorcT, 
FROM tCariH

if available, better order by date column.

UV.
  • 492
  • 6
  • 9
8

Query would be like this:

SELECT ID, AccountID, Quantity, 
       SUM(Quantity) OVER (PARTITION BY AccountID ) AS TopBorcT 

       FROM #Empl ORDER BY AccountID

Partition by works like group by. Here we are grouping by AccountID so sum would be corresponding to AccountID.

First first case, AccountID = 1 , then sum(quantity) = 10 + 5 + 2 => 17 & For AccountID = 2, then sum(Quantity) = 7+3 => 10

so result would appear like attached snapshot.

xan
  • 7,440
  • 8
  • 43
  • 65