1

I'm trying to understand how to make a partial sum with a condition on a table.

I've this table:

CREATE TABLE [dbo].[test](
    [Id] [int] NOT NULL,
    [part] [varchar](50) NULL,
    [value] [int] NULL,
    [TimeValue] [int] NULL,
 CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (1, N'part1', 50, 15)
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (2, N'part1', 8, 12)
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (3, N'part1', 12, 9)
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (4, N'part1', 10, 20)
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (5, N'part2', 6, 4)
GO
INSERT [dbo].[test] ([Id], [part], [value], [TimeValue]) VALUES (6, N'part2', 9, 15)
GO

so

Id  part    value   TimeValue
1   part1    50       15
2   part1    8        12
3   part1    12       9
4   part1    10       20
5   part2    6        4
6   part2    9        15

and I shoud write a select to add a column pSum AS INT with the sum on partition of column part AS VARCHAR(50) on TimeValue AS INT <= of current row TimeValue

I've write this:

SELECT *, SUM(value) OVER (PARTITION BY part) AS pSum FROM test

getting naturally:

Id  part    value   TimeValue   pSum
1   part1   50          15       80
2   part1   8           12       80
3   part1   12          9        80
4   part1   10          20       80
5   part2   6           4        15
6   part2   9           15       15

but this make sum on every whole partition, I have to consider TimeValue condition to get this:

Id  part    value   TimeValue   pSum
1   part1   50      15          70      (sum in part1 where TimeValue <= 15 : 12 + 8 + 50)
2   part1   8       12          20      (sum in part1 where TimeValue <= 12 : 12 + 8)
3   part1   12      9           12      (sum in part1 where TimeValue <= 9 : just 12)
4   part1   10      20          80      (sum in part1 where TimeValue <= 20 : 12 + 8 + 50)
5   part2   6       4           6       (sum in part2 where TimeValue <= 4 : just 6)
6   part2   9       15          15      (sum in part2 where TimeValue <= 15 : 9 + 6)

I've read a possible solution for SQL 2012 a solution here:

SUM(value) OVER (PARTITION BY part ORDER BY TimeValue rows between unbounded preceding and current row ) AS pSum

but I'm using SQL 2008

Please help. Thanks in advance.

Community
  • 1
  • 1
user2595496
  • 111
  • 1
  • 7

1 Answers1

2

In SQL Server 2008, there is no really efficient way to do this. One method uses outer apply:

select t.*, v.cumesum
from test t outer apply
     (select sum(t2.value)
      from test t2
      where t2.part = t.part and t2.timevalue <= t.timevalue
     ) v(cumesum);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Consider that someone suggests me to do a WHILE LOOP, but your solution is really better. – user2595496 Nov 14 '16 at 15:37
  • 1
    @user2595496 That is debatable. For small sets of data you can get away with `CROSS/OUTER APPLY`. For large datasets, you are better off using a cursor. Consider that the method using `APPLY` is O(n^2), the cursor (or loop) method is O(n). – TT. Nov 14 '16 at 17:27
  • @TT. . . . The method using `APPLY` is *not necessary* O(n^2). In addition, it can make use of an index on `test(part, timevalue)`. That said, under many circumstances, using a `WHILE` loop might be better. But, it is not based on overall size of the data; it is based on the maximum number of rows for a given `part` -- and the presence or absence of indexes. – Gordon Linoff Nov 15 '16 at 02:20
  • @GordonLinoff Closer look for one `part`. Suppose there are `n` rows for such a part. Barring any engine optimizations I don't know about, the number of operations is `1+2+3...+n` which equals `S(n)=n(n-1)/2`. The order of that is `O(n²)` for one part. Of course the `n` is different for each part, so it's hard to put a strict complexity on it (been too long since I've done any asympotic complexity math) but it's definitely not linear and not `O(nlogn)` either. So if I'd put a complexity on it I'd say quadratic complexity. The existence of an index has no bearing on the complexity. – TT. Nov 15 '16 at 06:48
  • @GordonLinoff Maybe it's more like `O(n²m)` where `n` is the average number of rows for each part, and `m` is the number of parts. For the linear version (`SUM() OVER`) it would be `O(nm)`. That's what I wanted to highlight anyway. I'm not saying the complexity math is 100% correct here, but I won't be too far off. – TT. Nov 15 '16 at 06:53
  • @GordonLinoff I said *"The existence of an index has no bearing on the complexity."* >> I take that back :). Without an index the complexity would be worse. Heh :) – TT. Nov 15 '16 at 07:10