8

Say for instance I'm joining on a number table to perform some operation between two dates in a subquery, like so:

select n
      ,(select avg(col1)
          from table1
         where timestamp between dateadd(minute, 15*n, @ArbitraryDate) 
                             and dateadd(minute, 15*(n+1), @ArbitraryDate))
  from numbers
 where n < 1200

Would the query perform better if I, say, constructed the date from concatenating varchars than using the dateadd function?

gbn
  • 422,506
  • 82
  • 585
  • 676
Daniel
  • 10,864
  • 22
  • 84
  • 115

6 Answers6

6

Keeping data in the datetime format using DATEADD is most likely to be quicker

Check this question: Most efficient way in SQL Server to get date from date+time?

The accepted answer (not me!) demonstrates DATEADD over string conversions. I've seen another too many years ago that showed the same

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • That's exactly the type of answer I was hoping for. +1 to you for finding it and +1 to Tomas for the great benchmark. Thanks! – Daniel Jan 20 '10 at 22:08
4

Be careful with between and dates, take a look at How Does Between Work With Dates In SQL Server?

I once optmized a query to run from over 24 hours to 36 seconds. Just don't use date functions or conversions on the column , see here: Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code

to see what query performs better, execute both queries and look at execution plans, you can also use statistics io and statistics time to get how many reads and the time it took to execute the queries

SQLMenace
  • 132,095
  • 25
  • 206
  • 225
  • Thanks for the heads up on "between". I see I would be getting some overlap. – Daniel Jan 20 '10 at 18:09
  • The `BETWEEN` operator treats your range as a [closed range](https://en.wikipedia.org/wiki/Interval_(mathematics)#Notations_for_intervals), when standard practice is to use half-open ranges. For example, `[a, b)` means `a <= x < b` so it is highly convenient to write things like "over the year 2019" as `[2019, 2020)` and to translate it to SQL as `@x >= '2019-01-01' AND @x < '2020-01-01'`, versus having to suffer from [data-type specific epsilons](https://stackoverflow.com/questions/21825615) doing `@x BETWEEN '2019-01-01' AND '2019-12-31 23:59:59.997`, to avoid including `2020-01-01 00:00`. – Elaskanator Nov 15 '19 at 20:29
3

I would NOT go with concatenating varchars.

DateAdd will def be better performace than string contatenation, and casting to DATETIME.

As always, you best bet would be to profile the 2 options, and determine the best result, as no DB is specified.

Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
3

most likely there will be no differenfce one way or another. I would run this:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

followed by both variants of your query, so that you see and compare real execution costs.

A-K
  • 16,804
  • 8
  • 54
  • 74
2

As long as your predicate calculations do not include references to the columns of the table you're querying, your approach shouldn't matter either way (go for clarity).

If you were to include something from Table1 in the calculation, though, I'd watch out for table scans or covering index scans as it may no longer be sargable.

In any case, check (or post!) the execution plan to confirm.

Michael Haren
  • 105,752
  • 40
  • 168
  • 205
2

Why would you ever use a correlated subquery to begin with? That's going to slow you up far more than dateadd. They are like cursors, they work row by row. Will something like this work?

 select n.n , avgcol1   
    from numbers n 
    left outer join  
        (
        select avg(col1) as avgcol1, n
        from table1 
        where timestamp between dateadd(minute, 15*n, @ArbitraryDate)  
           and dateadd(minute, 15*(n+1), @ArbitraryDate)
        Group by n
        ) t
     on n.n = t.n
    where n < 1200 
A-K
  • 16,804
  • 8
  • 54
  • 74
HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • That's a good idea. The problem is that table1 is not enumerated (or necessarily enumerable). – Daniel Jan 20 '10 at 18:58
  • I replaced inner join with left outer one, so that this query is equivalent to the original one. – A-K Jan 20 '10 at 19:46
  • In general, it is plain wrong that correlated subqueries work row by row. We can compare execution plans and see for ourselves. – A-K Jan 20 '10 at 19:48
  • Alex a direct quote from books online: "In queries that include a correlated subquery (also known as a repeating subquery), the subquery depends on the outer query for its values. This means that the subquery is executed repeatedly, once for each row that might be selected by the outer query." – HLGEM Jan 20 '10 at 20:45
  • @HLGEM and @AlexKuznetsov, I still don't think this will work unless table1 is enumerable. table1 would require an n column on which to join. – Daniel Jan 20 '10 at 21:26
  • HLGEM, the quote from books online is just plain wrong. I'll bust that myth on sqlblog.com in a few minutes. – A-K Jan 20 '10 at 23:25