25

My table (SQL Server 2008) has 1 million+ records, when I try to order records by datetime, it takes 1 second, but when I order by ID (int), it only takes about 0.1 second.

Is there any way to improve the efficiency? (I already added the datetime column to the index)

silent
  • 3,964
  • 5
  • 27
  • 29
  • 8
    is that datetime column in a separate index of its own? You say "added .. to the index".... if the datetime column is e.g. column no. 3 in a compound index, that won't help at all when trying to order by that datetime column alone........ – marc_s Nov 11 '09 at 17:38
  • I had added it to an existing index, I just tried to create a new index, it gets relatively faster (0.5 seconds), but still slower than an int column. – silent Nov 11 '09 at 17:45
  • Could you please post your query and the table definition? – Quassnoi Nov 11 '09 at 17:50
  • @silent: a DATETIME column is 8 byte - an INT is 4 byte. This means on your index pages, the INT index has twice as many entries on a 8K page than a DATETIME - no way to change that. So an INT index will be faster since it incurs less I/O which is the expensive part. – marc_s Nov 11 '09 at 18:13
  • `@marc_s`: `ID` is either a clustered `PRIMARY KEY` or not. In the first case the index on `ID` contains the whole table which is much more than `4` bytes per record. In the second case, indexes on both `id` and `datetime` will contain a row pointer. In both cases, the difference in index size will be less than twice. It's the `Key Lookup` / `RID Lookup` which matters here, not the index size. – Quassnoi Nov 11 '09 at 18:22

8 Answers8

26

Ordering by id probably uses a clustered index scan while ordering by datetime uses either sorting or index lookup.

Both these methods are more slow than a clustered index scan.

If your table is clustered by id, basically it means it is already sorted. The records are contained in a B+Tree which has a linked list linking the pages in id order. The engine should just traverse the linked list to get the records ordered by id.

If the ids were inserted in sequential order, this means that the physical order of the rows will match the logical order and the clustered index scan will be yet faster.

If you want your records to be ordered by datetime, there are two options:

  • Take all records from the table and sort them. Slowness is obvious.
  • Use the index on datetime. The index is stored in a separate space of the disk, this means the engine needs to shuttle between the index pages and table pages in a nested loop. It is more slow too.

To improve the ordering, you can create a separate covering index on datetime:

CREATE INDEX ix_mytable_datetime ON mytable (datetime) INCLUDE (field1, field2, …)

, and include all columns you use in your query into that index.

This index is like a shadow copy of your table but with data sorted in different order.

This will allow to get rid of the key lookups (since the index contains all data) which will make ordering by datetime as fast as that on id.

Update:

A fresh blog post on this problem:

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Is there any efficient way for doing this? – silent Nov 11 '09 at 17:35
  • I believe the database will have stored it in this format anyway (and likey compares it in this manner as well). I thought of the same thing at first too though but I don't believe that it should be the answer. – Jrud Nov 11 '09 at 17:38
  • 1
    +1 To optimize for the datetime sort, make the datetime the clustered index, and the id index a nonclustered primary key. – Andomar Nov 11 '09 at 17:43
6

To honor the ORDER BY the engine has two alternatives:

  • scan the rows using an index that offers the order requested
  • sort the rows

First option is fast, second is slow. The problem is that in order to be used, the index has to be a covering index. Meaning it contains all the columns in the SELECT projection list and all the columns used in WHERE clauses (at a minimum). If the index is not covering then the engine would have to lookup the clustered index (ie the 'table') for each row, in order to retrieve the values of the needed columns. This constant lookup of values is expensive, and there is a tipping point when the engine will (rightfully) decide is more efficient to just scan the clustered index and sort the result, in effect ignoring your non-clustered index. For details, see The Tipping Point Query Answers.

Consider the following three queries:

SELECT dateColumn FROM table ORDER BY dateColumn
SELECT * FROM table ORDER BY dateColumn
SELECT someColumn FROM table ORDER BY dateColumn

The first one will be be using a non-clustered index on dateColumn. But a the second one will not be using an index on dateColumn, will likely choose a scan and sort instead for 1M rows. On the other hand the third query can benefit from an index on Table(dateColumn) INCLUDE (someColumn).

This topic is covered at large on MSDN see Index Design Basics , General Index Design Guidelines , Nonclustered Index Design Guidelines or How To: Optimize SQL Indexes.

Ultimately, the most important choice of your table design is the clustered index you use. Almost always the primary key (usually an auto incremented ID) is left as the clustered index, a decision that benefits only certain OLTP loads.

And finally, a rather obvious question: Why in the world would you order 1 million rows?? You can't possibly display them, can you? Explaining a little bit more about your use case might help us find a better answer for you.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • Thank you Remus, I don't want to display 1 million+ records in one time, I'm using row_number() method for paging, but I found when I try to browse a big number, like page 50000 (20 rec/page), the query becomes very slow, but if I change the ordering method to ID, it gets almost 10 times faster. – silent Nov 12 '09 at 03:45
  • 1
    I thought this must be row_number pagination. Is it LINQ, by any chance? Your best bet is to express the query as follows: select the IDs of the page you're interested (ed. the 20 IDs on page 17) then retrieve the details of those 20 records. This *can* be expressed in T-SQL and also in LINQ and can be fast enough. You better post the schema and queries involved. – Remus Rusanu Nov 12 '09 at 04:21
2

Add the date time to a new index, adding it to the id one will still not help much.

Mark Dickinson
  • 6,573
  • 4
  • 29
  • 41
1

Could it be that there is an index for your int column but not for your datetime column? Look at the execution plan.

Nestor
  • 13,706
  • 11
  • 78
  • 119
0

maybe if you store datatime as a int but it would take time converting each time you store or get data. (common technique used to store staff like ip address and have a faster seek times)

you should check in your server how it stores datetime, b/c it your server already stores it as int or bigint.. it will not change anything....

Dani
  • 14,639
  • 11
  • 62
  • 110
0

If your datetime field contains a lot of distinct values and those values rarely change, define a clustered index on the datetime field, this will sort the actual data by the datetime value. See http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx for using clustered indexes.

This will make you int searches slower though, as they will be relegated to using a non-clustered index.

badbod99
  • 7,429
  • 2
  • 32
  • 31
0

Have you added the DateTime field to "the" index or to an exclusive index? Are you filtering your selection by another field and the DateTime or only this one?

You must have an index with all the fields that you are filtering and preferably in the same order to optmize performance.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106
j.a.estevan
  • 3,057
  • 18
  • 32
  • I had added it to an existing index, I just tried to create a new index, it gets relatively faster (0.5 seconds), but still slower than an int column. – silent Nov 11 '09 at 17:44
  • It is important to make a good index looking at the fields in the "WHERE" and "ORDER BY" statements (and "GROUP BY", if any). It should be the same fields and in the same order. If you don't need every column in the table, make the SELECT only with the desired fields. Then clear the cache and statistics and try the results. Also think that is dificult to select at the same velocity of the clustered index in tables with lots of data. This is the reason to select carefully which should be the clustered one. – j.a.estevan Nov 11 '09 at 18:59
0

In some cases you can set datetime as cluster index, if your table’s primary key is not important, for example a user logs table, you can set primary key not cluster, and then set times and user id as combine index and set cluster true.

Hugejile
  • 83
  • 1
  • 1