11

I have to work with a potentially large list of records and I've been Googling for ways to avoid selecting the whole list, instead I want to let users select a page (like from 1 to 10) and display the records accordingly.

Say, for 1000 records I will have 100 pages of 10 records each and the most recent 10 records will be displayed first then if the user click on page 5, it will show records from 41 to 50.

Is it a good idea to add a row number to each record then query based on row number? Is there a better way of achieving the paging result without too much overhead? So far those methods as described here look the most promising:

http://developer.berlios.de/docman/display_doc.php?docid=739&group_id=2899

http://www.codeproject.com/KB/aspnet/PagingLarge.aspx

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
fred
  • 319
  • 2
  • 5
  • 17

7 Answers7

15

The following T-SQL stored procedure is a very efficient implementation of paging. THE SQL optimiser can find the first ID very fast. Combine this with the use of ROWCOUNT, and you have an approach that is both CPU-efficient and read-efficient. For a table with a large number of rows, it certainly beats any approach that I've seen using a temporary table or table variable.

NB: I'm using a sequential identity column in this example, but the code works on any column suitable for page sorting. Also, sequence breaks in the column being used don't affect the result as the code selects a number of rows rather than a column value.

EDIT: If you're sorting on a column with potentially non-unique values (eg LastName), then add a second column to the Order By clause to make the sort values unique again.

CREATE  PROCEDURE dbo.PagingTest
(
    @PageNumber int,
    @PageSize int
)
AS

DECLARE @FirstId int, @FirstRow int

SET @FirstRow = ( (@PageNumber - 1) * @PageSize ) + 1
SET ROWCOUNT @FirstRow

-- Add check here to ensure that @FirstRow is not
-- greater than the number of rows in the table.

SELECT   @FirstId = [Id]
FROM     dbo.TestTable
ORDER BY [Id]

SET ROWCOUNT @PageSize

SELECT   *
FROM     dbo.TestTable
WHERE    [Id] >= @FirstId
ORDER BY [Id]

SET ROWCOUNT 0
GO 
HTTP 410
  • 17,300
  • 12
  • 76
  • 127
  • 1
    @ScottE: id breaks make no difference to this code, because the upper size is the number of rows in the page (not the id number). – HTTP 410 Dec 05 '10 at 14:15
  • @ScottE: also, the column used doesn't need to be an incremental id. You can use any column suitable for sorting. – HTTP 410 Dec 05 '10 at 14:17
  • @Martin: agreed, TOP should work just as well with any version of SQL Server greater tham 2000. I've not tested that, so I just went with the code that I have tested. – HTTP 410 Dec 05 '10 at 14:24
  • 2
    On thinking about it. This will still end up doing 2 different index scans, I'd have thought that this would be slightly less efficient then just using `row_count` and doing it all in one scan as it will have to navigate the index root and intermediate pages to "find its place" when doing the `WHERE [Id] >= @FirstId bit`. That is assuming that both scans will be on the same index though. I suppose if the first scan were to be against an extremely narrow index on id alone and the second against a covering index with `id` as the leading field this might perform better! – Martin Smith Dec 05 '10 at 14:45
  • @David: SQL Server int. You don't think 2^31-1 is big enough? – HTTP 410 Dec 08 '10 at 11:00
  • @RoadWarrior belated reply - the problem isn't with integer precision; it's with non-unique keys in a multi-server, distributed environment. Log shipping, etc. work around this, but they're kludges that could be *easily* fixed by using a GUID. (I have yet to hear a factually-correct argument against this, but I'm all ears.) – 3Dave May 29 '12 at 22:30
  • @David, in 20+ years I've never used sharding, database replication, or any other distributed database techniques. A database cluster is as far as I've gone, so I've always been able to use identity columns. I agree that identities aren't the ideal approach in an environment where your database really has to be distributed. – HTTP 410 May 30 '12 at 10:28
  • @RoadWarrior and ints are totally acceptable in that circumstance. I'm dealing with geographically-distributed databases that have to keep working when the fat transatlantic undersea pipe fails, so guaranteed uniqueness is a bit more involved. – 3Dave May 30 '12 at 13:34
  • I take it this only works if the column you're ordering on contains unique values? What if I wanted to sort on last name - then when I select `@FirstId` (or rather, `@FirstValue`), I can't do >= on it, because it will then include some rows that should not be, because their lastname values are the same as the lastname value at `@FirstRow`. – MajorRefactoring Nov 09 '12 at 18:40
  • @MajorRefactoring, yes, you're correct. If you use a column containing non-unique values for page sorting, then if the non-unique values span over a page boundary, you'll see an unpredictable sorting at that page boundary. To fix this, I recommend that you add a second column to the Order By clause. As long as the combination of columns has a unique value, then this will work. – HTTP 410 Nov 10 '12 at 01:14
  • @RoadWarrior - ah. Clever. Nice little technique. – MajorRefactoring Nov 12 '12 at 15:27
  • You might be interested in the "seek method", which I have described [in my answer](http://stackoverflow.com/a/19609594/521799). It is a more generic appraoch of using predicates (and thus indexing) over offsets for paging. – Lukas Eder Oct 26 '13 at 17:22
  • Hi, what about if you have where conditions? –  Aug 01 '18 at 12:16
7

If you use a CTE with two row_number() columns - one sorted asc, one desc, you get row numbers for paging as well as the total records by adding the two row_number columns.

create procedure get_pages(@page_number int, @page_length int)
as
    set nocount on;

    with cte as
    (
        select 
            Row_Number() over (order by sort_column desc) as row_num
            ,Row_Number() over (order by sort_column) as inverse_row_num
            ,id as cte_id
        From my_table
    )
    Select 
        row_num+inverse_row_num as total_rows
        ,*  
    from CTE inner join my_table
        on cte_id=df_messages.id
    where row_num between 
        (@page_number)*@page_length 
        and (@page_number+1)*@page_length
    order by rownumber
3Dave
  • 28,657
  • 18
  • 88
  • 151
  • 2
    A note on the inverse sorting, if the column has non-unique data, then you have a chance of the sorting in the OVER clause being inconsistent. – Tyler Clendenin Sep 17 '14 at 18:13
  • Also, you can use ", Count(*) OVER() as total_row_count" which will return a column of the same number over and over again. – Tyler Clendenin Sep 17 '14 at 18:16
5

Using OFFSET

Others have explained how the ROW_NUMBER() OVER() ranking function can be used to perform pages. It's worth mentioning that SQL Server 2012 finally included support for the SQL standard OFFSET .. FETCH clause:

SELECT first_name, last_name, score
FROM players
ORDER BY score DESC
OFFSET 40 ROWS FETCH NEXT 10 ROWS ONLY

If you're using SQL Server 2012 and backwards-compatibility is not an issue, you should probably prefer this clause as it will be executed more optimally by SQL Server in corner cases.

Using the SEEK Method

There is an entirely different, much faster way to perform paging in SQL. This is often called the "seek method" as described in this blog post here.

SELECT TOP 10 first_name, last_name, score
FROM players
WHERE (score < @previousScore)
   OR (score = @previousScore AND player_id < @previousPlayerId)
ORDER BY score DESC, player_id DESC

The @previousScore and @previousPlayerId values are the respective values of the last record from the previous page. This allows you to fetch the "next" page. If the ORDER BY direction is ASC, simply use > instead.

With the above method, you cannot immediately jump to page 4 without having first fetched the previous 40 records. But often, you do not want to jump that far anyway. Instead, you get a much faster query that might be able to fetch data in constant time, depending on your indexing. Plus, your pages remain "stable", no matter if the underlying data changes (e.g. on page 1, while you're on page 4).

This is the best way to implement paging when lazy loading more data in web applications, for instance.

Note, the "seek method" is also called keyset paging.

Community
  • 1
  • 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • While a good case can be made that jump-to-page search UI aren't a great design anyway, I think you really do need to show your users how many results their search returned (or at least an approximation, as Google does, though I'm not sure any SQL databases have that sort of functionality). So this method requires an additional count query, which ups the cost of it quite a lot. – EricS Aug 18 '14 at 22:40
  • This may be a YMMV thing, but I did a comparison between this seek method (on sql server 2008, alas) and the "holy grail" method from the sqlservercentral article, which does give you the count, and the two perform pretty similarly. Adding the cost of count query to the seek method however made the grail query the clear winner. – EricS Aug 18 '14 at 22:47
4

Try something like this:

declare @page int = 2
declare @size int = 10

declare @lower int =  (@page - 1) * @size
declare @upper int =  (@page    ) * @size

select * from (
select 
    ROW_NUMBER() over (order by some_column) lfd,
* from your_table
) as t
 where lfd between @lower and @upper
 order by some_column
bernd_k
  • 11,558
  • 7
  • 45
  • 64
  • 2
    ROW_NUMBER is known to have performance issues with very large result sets: http://www.4guysfromrolla.com/webtech/042606-1.shtml – HTTP 410 Dec 05 '10 at 22:31
  • 3
    shouldn't it be 'declare @lower int = (@page * @size) - (@size - 1)' – Reid Evans Dec 21 '12 at 17:27
  • @RoadWarrior: Interesting link. I've added it to my recent [blog post](http://blog.jooq.org/2013/10/26/faster-sql-paging-with-jooq-using-the-seek-method/) about faster paging. Do you know about the [seek method](http://stackoverflow.com/a/19609594/521799)? It doesn't allow for true indexed offsets, but can jump to the "next" page in constant time. – Lukas Eder Oct 26 '13 at 17:25
  • In testing, @ReidEvans is correct, it should be **(@page * @size) - (@size - 1)**. The **(@page - 1) * @size** returns rows 1-10, then 10-20 instead of 11-20 as expected. – Rentering.com Nov 09 '15 at 13:29
3

Here's an updated version of @RoadWarrior's code, using TOP. Performance is identical, and extremely fast. Make sure you have an index on TestTable.ID

CREATE PROC dbo.PagingTest
    @SkipRows int,
    @GetRows int
AS
DECLARE @FirstId int

SELECT   TOP (@SkipRows) 
         @FirstId = [Id]
FROM     dbo.TestTable
ORDER BY [Id]

SELECT   TOP (@GetRows) *
FROM     dbo.TestTable
WHERE    [Id] >= @FirstId
ORDER BY [Id]

GO 
Lane
  • 2,669
  • 3
  • 25
  • 38
0

Try this

Declare @RowStart int, @RowEnd int;


SET @RowStart = 4;
SET @RowEnd = 7; 

With MessageEntities As 
(
    Select ROW_NUMBER() Over (Order By [MESSAGE_ID]) As Row, [MESSAGE_ID]
    From [TBL_NAFETHAH_MESSAGES]
)
Select  m0.MESSAGE_ID, m0.MESSAGE_SENDER_NAME,
        m0.MESSAGE_SUBJECT, m0.MESSAGE_TEXT
From MessageEntities M
    Inner Join [TBL_NAFETHAH_MESSAGES] m0 on M.MESSAGE_ID = m0.MESSAGE_ID
Where M.Row Between @RowStart AND @RowEnd
Order By M.Row Asc
GO
Aalkhodiry
  • 2,178
  • 30
  • 25
0

Why not to use recommended solution:

SELECT VALUE product FROM AdventureWorksEntities.Products AS product order by product.ListPrice SKIP @skip LIMIT @limit