2

I am looking at reworking our website's reporting pages to be faster, and I am on the fence about how I should implement paging. Our database is large, > 150 million records. Most of our reports require complicated data that comes from up to 5 to 10 tables so each may have 5 or 6 joins and some inner selects. Obviously, they are not fast queries.

To implement paging on the database side, for each web request, I need to query the database for the rows for the current page (say 100 of 10,000), but I also have to query the database again to get the total number of possible rows. As a result, I am essentially running the whole query twice because the query to get the total number of records will still need to do all of the joins and inner selects to determine the total.

Would it not be better to run the query once, return all results, cache it in session, and page it using web code? I known I am initially pulling more data, but I am only running a query that can take 30 - 60 seconds once instead of twice.

This is kind of a technology generic question, but in case it matters, I am using .net 4.0 and Oracle 11g.

Zach Green
  • 3,421
  • 4
  • 29
  • 32

3 Answers3

4

From my experience, paging is always faster if left to the database. After all a database is built to query and manipulate massive amounts of data.

If you return large amounts of data in .NET and "cache" it in session you will quickly run out of memory on your server.

Mutt
  • 937
  • 6
  • 9
3

you could get the number of rows at the same time as your paged rows with analytics like this:

SELECT [cols], nb_rows
  FROM (SELECT [cols], nb_rows, rownum r
          FROM (SELECT [cols], count(*) over() nb_rows
                  FROM [your_query])
         WHERE rownum <= :M)
 WHERE r >= :N

This would make sure you only run the query once and will be less stressful to your network bandwidth.

For further analysis, see Speed of paged queries in Oracle.

Caching the result of the whole query might make sense if:

  • Users regularly move forward/backward through the result set without the need to refresh the data
  • Network bandwidth and available memory (application side) are sufficient (most likely this would only be possible if the number of simultaneous users is kept small)
Community
  • 1
  • 1
Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
  • So if my query returns 10,000 rows, this query will call 'count(*) over() nb_rows' for each row? Should I worry about that, or is the time required for it to run that count negligible? – Zach Green May 27 '11 at 13:08
  • @Zach: this is an analytical function, it will be computed once per "window" (here only once). – Vincent Malgrat May 27 '11 at 13:40
1

Do it in the db. For Oracle, you might try something like:

select * 
  from ( select a.*, rownum r 
           from ( select *
                    from t
                   where x = :host_variable
                   order by y ) a
          where rownum <= :HigherBound )
 where r >= :LowerBound

where LowerBound and HigherBound define your page bounds (for page 1 showing 10 per page, you'd have lower=1 and higher=10)

The trick here is:

  1. Make sure your inner select with order by is fairly fast (uses proper indexes).
  2. Use of rownum lets Oracle do a stopkey, which helps limit rows it needs to process.

As for your situation, if you have a complicated query that takes time to run and returns a large amount of data, you'll definitely want to create a materialized view first, add an index or two to support queries on the snapshot table, and then refresh complete when needed. Your query above will be from the mat view, NOT from the base tables.

tbone
  • 15,107
  • 3
  • 33
  • 40
  • I think the materialized view is probably a nice idea for some situations, but it will not work for us because of our requirements. Users come into our site, add items, and immediately run reports to see the items (and the item's pedigree) that they have access to see (determining that access is part of what complicates and slows our queries because a user can have or not have access to items based on a long list of rules that have to be checked). – Zach Green May 27 '11 at 14:10
  • 1
    i see, i thought this was users running fairly static reports (refresh once or twice/day). Sounds like this is more of a db design issue than a pagination issue then, if your requirements are as such and to accomplish it you need to run a 60 sec query each time for each user, pagination might be the least of your worries. Sorry. – tbone May 27 '11 at 14:27
  • agreed! for 95% of users the queries are quick, but some users have an insane amount of data. – Zach Green May 27 '11 at 14:57