SQL Server 2008 Paged Row Retrieval and Large Tables

Question

I'm using SQL Server 2008 and the following query to implement paged data retrieval from our JSF application, in below code i am retrieving 25 rows at a time sorted by the default sort column in DESC order.

SELECT * FROM 
    (  
        SELECT TOP 25 * FROM 
        ( 
            SELECT TOP 25  ...... WHERE CONDITIONS 
            --ORDER BY ... DESC
        )AS INNERQUERY  ORDER BY INNERQUERY.... ASC
    ) 
AS OUTERQUERY 
ORDER BY OUTERQUERY.... DESC

It works, but with one obvious flow. If the users request to see the last page and there are over 10 million records in table, then the second TOP Query will have to first retrieve the 10 million records and only then the first top Query will pick out the Top 25 which will look like:

SELECT * FROM 
    (  
        SELECT TOP 25 * FROM 
        ( 
            SELECT TOP 10000000  ...... WHERE CONDITIONS 
            --ORDER BY ... DESC
        )AS INNERQUERY  ORDER BY INNERQUERY.... ASC
    ) 
AS OUTERQUERY 
ORDER BY OUTERQUERY.... DESC

I looked into replacing the above with ROW_NUMBER OVER(....) but seemingly i had the same issue where the second TOP statement will have to get the entire result and only then you can do a where ROW_NUMBER between x and y.

Can you please point out my mistakes in the above approach and hints on how it can be optimized?

Here's a good example of row_number that you may find useful: http://blogs.x2line.com/al/archive/2005/11/18/1323.aspx — ron tornambe, Mar 23 '12 at 00:07
thanks for the link, i had a look but as you can see yourself, the first SELECT Statement will retrieve ALL the rows and only then you pick out a range using `between x and y`in the second SELECT. My problem is that the first Select Statement takes a while to return because of large number of records. — ke3pup, Mar 23 '12 at 00:13
Have a look at the this SO link and its reference to Greg Hamilton's article 'A More Efficient Method for Paging Through Large Result Sets' http://stackoverflow.com/questions/2308220/is-there-any-performance-issue-using-row-number-to-implement-table-paging-in-sql — ron tornambe, Mar 23 '12 at 00:36
Does your table have a lot of columns? When you say the "default sort column", is it a clustered primary key column? Does the data in this table change very often? Do changes happen in rows in between or you usually make insertions at the end of the table? Have you checked your theroy looking at the execution plan? — JotaBe, Mar 23 '12 at 00:41
By the way, many times the problem is trying to provide the user with something which isn't definitely neccesary, useful or reasonable (yep, I know users ask this kind of things all the time!). No one would ever page through 10 million rows in groups of 25 (nor 500). He would starve and die before the 1000th iteration. So, why not offering paging from the start and from the end, if he really needs to? Can you make him understand he's asking something insane?I usually convince my customers to have this kind os sensible limitations on their queries. — JotaBe, Mar 23 '12 at 00:48
@JotaBe Thanks for the comments. there are about 45 columns retrieved both from the main queried table and the other 4 joined Tables. The data in 3 of the tables in query is constantly changed. The Default sort column is the latest "Updated Date" which has a Non-clustered Index on it. I am thinking much the same as you are by removing the "Go to First" and "Go to Last" paging buttons on the page so user will `only be going from page 1 to page 1+- ` . Hoping i can this "Design Change" approved. — ke3pup, Mar 23 '12 at 01:00
@techventure I think we need to see the inner query so that we can determine if this is even possible. — cctan, Mar 23 '12 at 01:11
One option which most customers usually accept is being able to change the order, or having some kind of filter, or both, and limiting their paging to the 1000 or 2000 first records. You're in the worst scenario: data that changes constantly, large table, and non-clustered index... ouch!! That avoids from maintaining auxiliary tables or structures to help the query. It's also important that your non-clustered index is maintained to avoid rfagmentation as much as possible. — JotaBe, Mar 23 '12 at 01:15

score 0 · Answer 1 · answered Feb 24 '13 at 13:03

we can improve above query a bit more. If I assume that @current_index is the current page number then we can rewrite the above query as:

WITH PAGED_QRY (
               SELECT top (@current_index * @rows_to_retrieve) *, ROW_NUMVER() 
                 OVER(ORDER BY Y) AS ROW_NO
               FROM TABLE WHERE ....
           ) 
 SELECT TOP @ROWS_TO_RETRIEVE FROM PAGED_QRY
 ORDER BY ROW_NO DESC

In this case, our inner query will not return the whole record set. Suppose our page_index is 3 & page_size is 50, then it will select only 150 rows(even if our table contains hundreds/thousands/millions of rows) & we can skip the where clause also.

score 0 · Accepted Answer · answered Apr 01 '12 at 09:50

I'm currently using the following to code to retrieve subset of rows:

WITH PAGED_QRY (
                   SELECT *, ROW_NUMVER() OVER(ORDER BY Y) AS ROW_NO
                   FROM TABLE WHERE ....
               ) 
SELECT * FROM PAGED_QRY WHERE ROW_NO BETWEEN @CURRENT_INDEX and @ ROWS_TO_RETRIEVE
ORDER BY ROW_NO

where @current_index and @rows_to_retrieve (ie. 1 and 50) are your paging variables. it's cleaner and easier to read.

I've also tried using SET ROW_COUNT @ROWS_TO_RETRIEVE but doesn't seem to make much difference.

Using above query and by carefully studying the execution path of the query and modifying/creating indexes and statistics I've reached results that are sufficiently satisfactory, hence why i'm making this as the answer. The original goal of retrieving only the required rows in the inner query seems to be not possible yet, if you do find the way please let me know.

SQL Server 2008 Paged Row Retrieval and Large Tables

2 Answers2