2

It would seem to be (almost) as easy to scan a table over a clustered index, summarizing a field from the "previous record", as just to summarize that field over the entire table. But no :( Is there anything I can do?

create table #Tmp(n int not null primary key)
insert into #Tmp values(0)

declare @k int = 1

while @k < 1024 * 1024 * 32
begin
    insert into #Tmp
    select n + @k
    from #Tmp

    select @k = @k + @k
end

declare @dummy bigint

declare @d1 datetime = GetDate()

select
    @dummy = Sum(Convert(bigint, n))
from
    #Tmp

declare @d2 datetime = GetDate()

select
    @dummy = Sum(convert(bigint, n0))
from
    (
    select
        n0 = Lag(n) over (order by n)
    from
        #Tmp
    ) as Q

declare @d3 datetime = GetDate()

select Convert(time(3), @d2 - @d1), Convert(time(3), @d3 - @d2)
-- 00:00:01.460, 00:00:46.273

drop table #Tmp
  • The query with the `LAG()` function reads 32 million rows (index entries). It's bound to take some time. What's your expectation in terms of elapsed time? – The Impaler Sep 20 '21 at 13:36
  • @TheImpaler - Presumably to be 1 second rather than 46 – Martin Smith Sep 20 '21 at 13:40
  • Just for reference I've tried this on my 10 yr old local Dev server and I get `00:00:00.607 00:00:03.297` – Stu Sep 20 '21 at 13:41
  • @MartinSmith Still... 46s is quite good, for a 32-million row query. If the OP wants faster performance maybe a cache (refreshed every 5-10 mins) could be used if data does not need to represent the exact up-to-date information. – The Impaler Sep 20 '21 at 13:42
  • But the point is that if scanning the entire 32 million row table and aggregating it takes 1 second why does the LAG take so much longer as it doesn't need to sort anything – Martin Smith Sep 20 '21 at 13:45
  • One thing to point out I'm using 2019 and the query executes in Batch mode, using hint `DISALLOW_BATCH_MODE` it takes about a minute so... upgrade to SQL 2019 to take advantage of batch mode. – Stu Sep 20 '21 at 13:47
  • 1
    I see parallelism for the non-LAG version which when you think about it is an obvious optimization when there's no dependency between rows, and although the final result of the `LAG` version discards any notion of ordering, it appears the optimizer cannot see through that. – Damien_The_Unbeliever Sep 20 '21 at 13:55
  • On 2019 localdb I don't see parallelism for either (expectedly) nor batch mode. I only loaded 16 m rows in. And see a 5x difference in elapsed time. Possibly not much that can be done about the timings for the additional execution plan operators apart from trying to get batch mode https://i.stack.imgur.com/DqLbp.png (on versions before batchmode on rowstore came out there is a trick to do that by referencing a table with a columnstore as a no-op) – Martin Smith Sep 20 '21 at 14:02
  • My 2019 gives me paralellism for the first query and batch mode for both [https://i.stack.imgur.com/zf7Ei.png](https://i.stack.imgur.com/zf7Ei.png). The bulk of the time without batch mode is spent on a Window Spool which is not present for batch mode. – Stu Sep 20 '21 at 14:11
  • yeah localdb doesn't support parallel plans so that was expected – Martin Smith Sep 20 '21 at 14:12
  • The batch mode window aggregate operator is available from 2016 according to here https://www.itprotoday.com/sql-server/what-you-need-know-about-batch-mode-window-aggregate-operator-sql-server-2016-part-1 so if the OP can manage to get it should give a good improvement – Martin Smith Sep 20 '21 at 14:15
  • @Andrew - what version of SQL Server are you on? – Martin Smith Sep 20 '21 at 14:16
  • The trick with `CREATE NONCLUSTERED COLUMNSTORE INDEX idx_cs_dummy ON #Tmp(n) WHERE n = -1 AND n = -2;` gave me the batchmode plan and a good speedup https://i.stack.imgur.com/jfhg9.png – Martin Smith Sep 20 '21 at 14:19

2 Answers2

1

Using SQL server prior to 2019 you can utilise some benefits of Batch Mode by creating a dummy table with a columnstore index.

Running the window function query on SQL Server 2016 it took 25 seconds:

enter image description here

However, after creating the following table

create table dbo.bmode (Id int not null)
create nonclustered columnstore index CC_BatchModeHack on dbo.bmode (Id) 

I can include it in the query using an outer join. It actually plays no part in the query at all however tricks the optimizer into selecting batch mode for the window function

select @dummy = Sum(Convert(bigint, n0))
from (
  select n0 = Lag(n) over (order by n)
  from #Tmp
  left join bmode on 1=0
)q

the result is execution time fell to almost 2 seconds

enter image description here

Stu
  • 30,392
  • 6
  • 14
  • 33
0

..for sql2019..

select
    @dummy = Sum(convert(bigint, n0)) * (1+APPROX_COUNT_DISTINCT(n0)-APPROX_COUNT_DISTINCT (n0))   
from …
lptr
  • 1
  • 2
  • 6
  • 16