SQL performance: WHERE vs WHERE(ROW_NUMBER)

Question

I want get n-th to m-th records in a table, what's best choice in 2 below solutions:

Solution 1:

    SELECT * FROM Table WHERE ID >= n AND ID <= m

Solution 2:

    SELECT * FROM 
                (SELECT *, 
                        ROW_NUMBER() OVER (ORDER BY ID) AS row 
                 FROM Table 
                )a 
    WHERE row >= n AND row <= m

Performance is obviously Solution 1. You should change the title if your going to take the 2nd answer as best. — danny117, Nov 01 '13 at 14:34

score 63 · Answer 1 · edited Oct 29 '13 at 23:36

As other already pointed out, the queries return different results and are comparing apples to oranges.

But the underlying question remains: which is faster: keyset driven paging or rownumber driven paging?

Keyset Paging

Keyset driven paging relies on remembering the top and bottom keys of the last displayed page, and requesting the next or previous set of rows, based on the top/last keyset:

Next page:

select top (<pagesize>) ...
from <table>
where key > @last_key_on_current_page
order by key;

Previous page:

select top (<pagesize>)
from <table>
where key < @first_key_on_current_page
order by key desc;

This approach has two main advantages over the ROW_NUMBER approach, or over the equivalent LIMIT approach of MySQL:

is correct: unlike the row number based approach it correctly handles new entries and deleted entries. Last row of Page 4 does not show up as first row of Page 5 just because row 23 on Page 2 was deleted in the meantime. Nor do rows mysteriously vanish between pages. These anomalies are common with the row_number based approach, but the key set based solution does a much better job at avoiding them.
is fast: all operations can be solved with a fast row positioning followed by a range scan in the desired direction

However, this approach is difficult to implement, hard to understand by the average programmer and not supported by the tools.

Row Number Driven

This is the common approach introduced with Linq queries:

select ...
from (
  select ..., row_number() over (...) as rn
  from table)
where rn between @firstRow and @lastRow;

(or a similar query using TOP) This approach is easy to implement and is supported by tools (specifically by Linq .Limit and .Take operators). But this approach is guaranteed to scan the index in order to count the rows. This approach works usually very fast for page 1 and gradually slows down as the an one goes to higher and higher page numbers.

As a bonus, with this solution is very easy to change the sort order (simply change the OVER clause).

Overall, given the ease of the ROW_NUMBER() based solutions, the support they have from Linq, the simplicity to use arbitrary orders for moderate data sets the ROW_NUMBER based solutions are adequate. For large and very large data sets, the ROW_NUMBER() can occur serious performance issues.

One other thing to consider is that often times there is a definite pattern of access. Often the first few pages are hot and pages after 10 are basically never viewed (eg. most recent posts). In this case, the penalty that occurs with ROW_NUMBER() for visiting bottom pages (display pages for which a large number of rows have to be counted to get the starting result row) may be well ignored.

And finally, the keyset pagination is great for dictionary navigation, which ROW_NUMBER() cannot accommodate easily. Dictionary navigation is where instead of using page number, users can navigate to certain anchors, like alphabet letters. Typical example being a contact Rolodex like sidebar, you click on M and you navigate to the first customer name that starts with M.

The correctness of keyset paging (also known as [seek method](http://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/)) depends on the point of view. Sometimes, you want to correlate the page number with the records' row numbers, e.g. when you display a rank (top 20-30 players in a ranking). Anyway, I think this explanation deserves a bit more attention! — Lukas Eder, Oct 27 '13 at 13:07
You also can't skip to page 7, or if you have to allow that you need to implement it separately. Not a deal-breaker in many scenarios, but might matter to some. — Doug McClean, Oct 27 '13 at 16:32
@DougMcClean: You can skip to page 7 in two queries, though. Or you don't correlate UI pages with DB pages (the latter being a bit larger). It's a bit more of a hassle, agreed, but on average, might still be faster. — Lukas Eder, Oct 27 '13 at 16:52
@LukasEder: rolled back your edit about JOOQ. There are other client side cursor libraries, eg. [`ODBC`](http://technet.microsoft.com/en-us/library/ms130900.aspx) and I did not want to go into specifics of 3rd party. I'm sure there are at least few that do a good job. When I said 'not supported by the tools' I had in mind the typical 2010-ish Windows/SQL Server toolset (that is, mostly Linq or EF). — Remus Rusanu, Oct 27 '13 at 18:31

score 12 · Accepted Answer · answered Jul 09 '10 at 12:53

12

The 2nd answer is your best choice. It takes into account the fact that you could have holes in your ID column. I'd rewrite it as a CTE though instead of a subquery...

;WITH MyCTE AS
(SELECT  *,  
         ROW_NUMBER() OVER (ORDER BY ID) AS row  
FROM     Table)
SELECT   *
FROM     MyCTE
WHERE    row >= @start 
         AND row <= @end

answered Jul 09 '10 at 12:53

Scott Ivey

40,768
21
80
118

I'd probably use 'between' for the where clause, but you're right. The first doesn't guarantee that you return row n through m at all, like gbn mentioned as well. – Rob Jul 09 '10 at 12:56

gbn · Answer 3 · 2010-07-09T13:00:25.773

2

They are different queries.

Assuming ID is a surrogate key, it may have gaps. ROW_NUMBER will be contiguous.

If you can guarantee you have no gaps in the data, then the 1st one because I'd hope it's indexed,. The 2nd one is more "correct" though.

edited Jul 09 '10 at 13:00

answered Jul 09 '10 at 12:53

gbn

422,506
82
585
676

SQL performance: WHERE vs WHERE(ROW_NUMBER)

3 Answers3

Keyset Paging

Row Number Driven

Linked