1

I have a table with structure like :

  • Id (serial int) (index on this)
  • Post (text)
  • ...
  • CreationDate (DateTime) (Desc index on this)

I need to implement pagination. My simple query looks like :

SELECT Id, Post, etc FROM Posts ORDER BY CreationDate desc OFFSET x LIMIT 15

When there are few records (below 1 mln) performance is somewhat bearable, but when the table grows there is a noticeable difference.

Skipping the fact that there is good to configure DB settings like cache size, work memory, cost, shared mem, etc... What can be done to improve the performance and what are the best practices of pagination using Postgres. There is something similar asked here, but I am not sure if this can be applied in my case too.

Since my Id is auto incremented (so predictable) one of the other options I was thinking is to have something like this

SELECT Id, Post...FROM Posts WHERE Id > x and Id < y

But this seems to complicate things, I have to get the count of records all the time and besides it is not guaranteed that I will always get 15 records(for example if one of the posts has been deleted and Ids are not in "straight" sequence anymore).

I was thinking about CURSOR too, but if I am not mistaken CURSOR will keep the connection open, which is not acceptable in my case.

Community
  • 1
  • 1
NDeveloper
  • 1,837
  • 4
  • 20
  • 34

1 Answers1

2

Pagination is hard; the RDBMS model isn't well suited to large numbers of short-lived queries with stateful scrolling. As you noted, resource use tends to be too high.

You have the options:

  • LIMIT and OFFSET
  • Using a cursor
  • Copying the results to a temporary table or into memcached or similar, then reading it from there
  • x > id and LIMIT

Of these, I prefer x > id with a LIMIT. Just remember the last ID you saw and ask for the next one. If you have a monotonously increasing sequence this will be simple, reliable, and for simple queries it'll be efficient.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks for the answer. I agree x > id would be the most efficient option assuming that most of the people will be checking most recent data anyway. But if I am not wrong there is an issue with it, since there can be constant inserts into the table and the Id is increasing. This will mean that all the time I have to get the max id and still can't rely on it as if some posts are deleted Ids will not be sequential (I mean step between Ids will not always be 1) ... – NDeveloper Dec 03 '13 at 01:32
  • @NDeveloper You are paginating, right? You just need to remember the ID you last saw in the session. So (say) you first run `SELECT ... FROM ... WHERE id > 0 ORDER BY id LIMIT 20`. You might get a result set with IDs 1, 3, 4, 5, ... 28. So your next query, for the next page, is `SELECT ... FROM ... WHERE id > 28 ORDER BY id LIMIT 1`. If you expect to mostly do *reverse* pagination you'd do your first query as `... ORDER BY id DESC LIMIT 20` and your next as `WHERE id > 99123 ORDER BY id DESC LIMIT 20` (where 99123 is the lowest ID you saw in your last query). – Craig Ringer Dec 03 '13 at 01:38
  • Yes you are right, that makes sense. The only issue with this approach is that if the Ids are not sequential the same record can appear several times. – NDeveloper Dec 03 '13 at 02:00
  • @NDeveloper If the IDs aren't sequential then it isn't a useful approach. It's only good when you can `ORDER BY` a unique, non-repeating key for pagination. You can use multiple keys, if you want to sort first by a non-unique key, then use a unique key as a tiebreaker though. – Craig Ringer Dec 03 '13 at 02:04
  • Actually Ids are unique...And I see what how this can work...Here is an example, lets assume I have SELECT...ORDER BY ID desc Limit 20...and my lowest Id here is 99123 ... so next select (page 2) Should look like SELECT...WHERE ID < 99123 ORDER BY ID desc LIMIT 20...What I am concerned though, will it not be slow..When I run ID < X will it not scan the whole table ? – NDeveloper Dec 03 '13 at 02:51
  • @NDeveloper No, it'll do a backwards index scan on the index that is used to enforce the uniqueness of `id`, as you'd see if you *tried it* with `EXPLAIN`. For best performance if you're doing this a lot you may want a secondary index on `mytable(id DESC)`, but that'll add to insert/update/delete/vacuum costs, so it's a balance. – Craig Ringer Dec 03 '13 at 02:58