2

I am trying to build a pagination mechanism. I am using a ORM that creates SQL looking like this:

SELECT * FROM 
    (SELECT t1.colX, t2.colY
         ROW_NUMBER() OVER (ORDER BY t1.col3) AS row 
     FROM Table1 t1
     INNER JOIN Table2 t2
     ON t1.col1=t2.col2
    )a 
WHERE row >= n AND row <= m

Table1 has >500k rows and Table2 has >10k records

I execute the queries directly in the SQL Server 2008 R2 Management Studio. The subquery takes 2-3sec to execute but the whole query takes > 2 min.

I know SQL Server 2012 accepts the OFFSET .. LIMIT .. option but I cannot upgrade the software.

Can anyone help me in improving the performance of the query or suggest other pagination mechanism that can be imposed through the ORM software.

Update:

Testing Roman Pekar's solution (see comments on the solution) proved that ROW_NUMBER() might not be the cause of the performance problems. Unfortunately the problems persist.

Thanks

Community
  • 1
  • 1
ikabott
  • 41
  • 1
  • 7
  • 1
    Did you look the execution plan? Is this a possible [duplicate](http://stackoverflow.com/questions/2135418/equivalent-of-limit-and-offset-for-sql-server)? – Luis LL Aug 05 '13 at 15:59
  • @LuisLL This doesn't look like a duplicate to me. His query is almost identical to the accepted answer in that question, but it doesn't perform well, so he's asking for something *better*. That's a new question. – ErikE Aug 05 '13 at 16:10
  • Thans @ErikE, but what about execution plan? – Luis LL Aug 05 '13 at 16:12
  • What *about* execution plan? I addressed only the duplicate part. – ErikE Aug 05 '13 at 16:13
  • You may be able to select top m but not sure and only a comment – paparazzo Aug 05 '13 at 17:21
  • 2
    What type is col3? Is it indexed? Do you have an identity column on the table? Is t2.col2 a PK by any chance? – paparazzo Aug 05 '13 at 17:28
  • You should add the table structures including any indexes and foreign key constraints. Without it we have to assume stuff like that col2 is the primary key of table2 and that col1 is a foreign key to table2 and does not allow null values and has a non clustered index. It is better not to have to assume things like that. Also if you want help with the ORM it might be a good thing if you mentioned what it is you are using. – Mikael Eriksson Aug 05 '13 at 19:59
  • t1.col1 is a FK (NOT NULL), t2.col2 is PK and t1.col3 is int IDENTITY(1,1) NOT NULL (set as simple index and also combined with a datetime column). The ORM is Lightspeed by Mindscape, but it is a SQL Server issue and the ORM is just creating the SQL to execute. – ikabott Aug 06 '13 at 09:15
  • 1
    Have a look at this [SQL Fiddle](http://sqlfiddle.com/#!3/80a76/1). It is another way to write your query as I understand it. You have to test on your data to see if it returns what you want, if it is efficient and if you can make your ORM produce it. `Table2` is a lookup table it will not contribute with additional rows and since `col2` is `not null` in `Table1` it will not be responsible for removing any rows returned from `Table1`. So you can apply `row_number` on `Table1` before the join to `Table2` which means that you only have to find the rows in `Table2` you actually need. – Mikael Eriksson Aug 06 '13 at 18:14
  • Fiddle above takes some time to load because it creates table with 500k and 10k rows. [Here is a version with fewer rows](http://sqlfiddle.com/#!3/83dd0/1). If this works for you I will be happy to convert the comment to an answer instead. – Mikael Eriksson Aug 07 '13 at 09:27
  • I am not able to reproduce my issue in your Fiddle but I cannot find any difference with my DB structure. I am out of ideas. – ikabott Aug 08 '13 at 10:12
  • Can you make your ORM create the query I use in the fiddle? – Mikael Eriksson Aug 08 '13 at 10:13
  • Added the comments above as an answer instead. – Mikael Eriksson Aug 08 '13 at 10:21

3 Answers3

3

As I understand your table structure from comments.

create table Table2
(
  col2 int identity primary key,
  colY int
)

create table Table1
(
  col3 int identity primary key,
  col1 int not null references Table2(col2),
  colX int
)

That means that the rows returned from Table1 can never be filtered by the join to Table2 because Table1.col1 is not null. Neither can the join to Table2 add rows to the result since Table2.Col2 is the primary key.

You can then rewrite your query to generate row numbers on Table1 before the join to Table2. And the where clause is also applied before the join to Table2 meaning that you will only locate the rows in Table2 that is actually part of the result set.

select T1.colX,
       T2.colY,
       T1.row
from
    (
    select col1,
           colX,
           row_number() over(order by col3) as row
    from Table1
    ) as T1
  inner join Table2 as T2
    on T1.col1 = T2.col2
where row >= @n and row <= @m

SQL Fiddle

I have no idea if you can make your ORM (Lightspeed by Mindscape) to generated the paging query like this instead of what you have now.

The query plan from this answer: enter image description here

The query plan using the query in the question: enter image description here

There is a huge difference in reads between the two.

enter image description here

Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
  • Let's say I add a WHERE condition in the subquery: `WHERE colX LIKE 'blabla%'`. The number of Table1 entries returned is 3. Now assume n=0. If m<=3 the total query executes in seconds if m>3 the total query executes in minutes. – ikabott Aug 08 '13 at 11:39
  • @ikabott Do you have a supporting index on `Table1.colX`? – Mikael Eriksson Aug 08 '13 at 12:17
  • @ikabott You mean like in this [SQL Fiddle](http://sqlfiddle.com/#!6/e8170/1)? I don't see that behavior at all with my query. Did you test this on my query or your query? – Mikael Eriksson Aug 08 '13 at 16:51
  • Your SQL Fiddle does not show the behaviour with your query, but if I apply your query to my database, it shows the same performance issues as the old query. I do not know what might be the difference from your created tables to my tables, but something must provoke the divergence in behaviour. – ikabott Aug 12 '13 at 09:28
  • @ikabott Can you post the XML version of the queryplan? Here in the question or link somewhere where it can be accessed. There might be things in there that could tell what the difference is. – Mikael Eriksson Aug 12 '13 at 09:45
  • [here](https://www.dropbox.com/s/gbcb8zm85ksw35i/ExecutionPlan%20_NU.sqlplan) you have the execution plan. In my DB the query returns a maximum of 3 results. If the page size is 2 it takes 33s, if it is 3 it takes 1:30min and if it is >=4 it takes 3min – ikabott Aug 19 '13 at 09:39
  • @ikabott The query you have used here is quite different from the query in the question. You have no `Table2` in it and the whole point of my answer was in how you handled `Table2` and the relationship between `Table2` and `Table1`. The query you have here use 1 table and a `like` predicate where you start of by a `%` (can't do seek with that). What would speed up this query is if you included `colY` to the index `ix_table1_col3` as an include column. That will remove the Key lookup operator from your query plan and your query will be a lot faster. – Mikael Eriksson Aug 19 '13 at 09:48
  • Thanks @Mikael Erikson, this definitely worked. I also realized the LIKE '%something' was the bottle neck when I saw the execution plan. – ikabott Aug 19 '13 at 13:33
1

Insert just the primary key column(s) of the paginated table into a temp table with an identity column, ordering by the ordered-by columns. (You may have to include the ordered-by columns to ensure the ordering comes out right.) Then, join back to the main table using the temp table as a key for the rows you want. If the data is fairly static, you could save the ordering data to a session-keyed permanent table instead of a temp table, and reuse it for a short period of time (so subsequent page requests within a few minutes are nearly instant).

Row_Number() tends to perform well with small sets of data, but it can hit serious performance snags once you get some serious rows, as you have with 500k.

ErikE
  • 48,881
  • 23
  • 151
  • 196
0

I suggest you to check indexes on your tables. I think it'll help your query if you at least have index on col2 on table2. You could also try to rewrite your query like

;with cte1 as (
    select top (@m) t1.colX, t2.colY, t1.col3
    from Table1 as t1
        inner join Table2 as t2 on t1.col1=t2.col2
    order by t1.col3 asc
),
cte2 as (
    select top (@m - @n + 1) *
    from cte1
    order by col3 desc
)
select *
from cte2 as t1

but it could still be slow if you don't have indexes

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • It's allowed if you specify top n – Roman Pekar Aug 06 '13 at 09:43
  • Interesting finding. If the total number of found rows (there is a condition in the cte1 table) is >= @m-@n+1 the query takes <10s but if @m-@n+1>foundRows the complete query takes more than 2min to execute. Maybe the ROW_NUMBER() was not the problem after all. Any Idea? – ikabott Aug 06 '13 at 11:40
  • Yes, idea is my answer. I've tested both queries yesterday and they work about the same if there was index on `col2`. I've tested it on about 500K and 10K rows and both worked ~1s – Roman Pekar Aug 06 '13 at 11:53
  • The query runs as you say, but if I apply a WHERE condition in the first part of the query that makes cte1 have less rows than @m-@n+1 (less than the row size) then and only then the query takes very long – ikabott Aug 08 '13 at 09:49
  • I'll try to check it on my data later – Roman Pekar Aug 08 '13 at 10:00