2

given if we have an 'Orders' table and columns like this

+------------+-----------------------+
|   Orders   |                       |
+------------+-----------------------+
| OrderId    | => clustered index    |
| CustomerId |                       |
| OrderDate  | => nonclustered index |
+------------+-----------------------+

so my question is, what the core difference between those queries, and which one is better in the performance perspective mainly

select * from Orders 
where OrderDate >= '2000-01-01' and OrderDate <= '2000-12-31' 

and

select * from Orders 
where YEAR(OrderDate) = 2000
Beshoy Hindy
  • 164
  • 2
  • 9
  • Meanwhile, you can throw detail like, what type of query will be done one Order table.What date range will be mostly use ? what date range will be less use ? – KumarHarsh Nov 19 '19 at 12:05

3 Answers3

2

This query:

select *
from Orders 
where OrderDate >= '2000-01-01' and OrderDate <= '2000-12-31' 

Can take advantage of three things that help to optimize the query:

  • An index on OrderDate.
  • Partitions on OrderDate.
  • Statistics maintained on OrderDate to optimize the plan.

This version:

where YEAR(OrderDate) = '2000'

has one major problem. The types are different -- why compare a string to a function that returns a number. It cannot use an index or partitions on OrderDate. And the statistics are less useful for optimizing the query plan.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • what if the condition compared to a number => where YEAR(OrderDate) = 2000 – Beshoy Hindy Nov 19 '19 at 11:50
  • @BeshoyHindy,where YEAR(OrderDate) = 2000 this will solve one minor problem only.Rest of the issue as describe will be still there. – KumarHarsh Nov 19 '19 at 12:00
  • To add to the answer the first query will use the indexes. YEAR(OrderDate) = 2000 will go through all the records. The problem might not be visible if there are less number of records but imagine if you have 10 million records, not using an existing index will hurt performance. – ankyskywalker Nov 19 '19 at 12:02
  • 1
    *"To add to the answer the first query will use the indexes.,, will go through all the records. "* No, the question is if SQL Server would even consider using a index with the first query also, it depends how many (expected) records match that query from the tables total, the optimizer is still costs based which Gordon also explained @ankyskywalker *" imagine if you have 10 million records, not using an existing index will hurt performance"* in fact using 10 million lookups (possible random disk i/o) on index will hurt performance more then a full table scan... When i doubt use EXPLAIN.. – Raymond Nijland Nov 19 '19 at 12:28
  • Great point @Raymond Nijland, I tried that with a table which has about a million records and a non-clustered index on a datetime column. YEAR() does lookup all of the rows whereas '<=' and '>=' looks up about a 1000 rows to pull the dataset which I wanted, about 900 rows. and yes, When i doubt use EXPLAIN – ankyskywalker Nov 19 '19 at 20:50
2

There is a "sargable queries" aspect

" The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use: WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009' "

look at this question, the first answer: What makes a SQL statement sargable?

Benzi
  • 398
  • 6
  • 14
1
+------------+-----------------------+
|   Orders   |                       |
+------------+-----------------------+
| OrderId    | => clustered index    |
| CustomerId |                       |
| OrderDate  | => nonclustered index |
+------------+-----------------------+

OrderId as clustered index is good enough when Orderid is INT or BIGINT and it is unique(i.e. Selective enough).

OrderDate : It is good choice for nonclustered index. It lack covering columns.

Create nonClustered index order_orderdate on Order(orderdate)include(Customerid,other column which is require in resultset)

Do not use *. mention all column which is require in ResultSet

As already discuss,

where OrderDate >= '2000-01-01' and OrderDate <= '2000-12-31'

is better than

where YEAR(OrderDate) = 2000

Date range can be across year also.

Still, where OrderDate >= '2000-01-01' and OrderDate <= '2000-12-31'

do not guarantee Index Seek plan.It depend upon so many other thing also.

Sql Optimizer often make good enough plan quickly.

You have to find,understand why you didn't get Index Seek and whether plan is good enough or not, whether plan is cost effective or not.

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22