39

I have read from Internet resources that a query will be slow when the offset increases. But in my case I think its much too slow. I am using postgres 9.3.

Here is the query (id is primary key):

select * from test_table offset 3900000 limit 100;

It returns the data in around 10 seconds. And I think its much too slow. I have around 4 million records in the table. Overall size of the database is 23GB.

Machine configuration:

RAM: 12 GB
CPU: 2.30 GHz
Core: 10

A few values from postgresql.conf file which I have changed are as below. Others are default.

shared_buffers = 2048MB
temp_buffers = 512MB
work_mem = 1024MB
maintenance_work_mem = 256MB
dynamic_shared_memory_type = posix
default_statistics_target = 10000
autovacuum = on
enable_seqscan = off   ## its not making any effect as I can see from Analyze doing seq-scan

Apart from these, I also have tried by changing the values of random_page_cost = 2.0 and cpu_index_tuple_cost = 0.0005 and the result is the same.

Explain (analyze, buffers) result over the query is as below:

"Limit  (cost=10000443876.02..10000443887.40 rows=100 width=1034) (actual time=12793.975..12794.292 rows=100 loops=1)"
"  Buffers: shared hit=26820 read=378984"
"  ->  Seq Scan on test_table  (cost=10000000000.00..10000467477.70 rows=4107370 width=1034) (actual time=0.008..9036.776 rows=3900100 loops=1)"
"        Buffers: shared hit=26820 read=378984"
"Planning time: 0.136 ms"
"Execution time: 12794.461 ms"

How do people around the world negotiates with this problem with Postgres? Any alternate solution will be helpful for me as well.

UPDATE:: Adding order by id (tried with other indexed column as well) and here is the explain:

"Limit  (cost=506165.06..506178.04 rows=100 width=1034) (actual time=15691.132..15691.494 rows=100 loops=1)"
"  Buffers: shared hit=110813 read=415344"
"  ->  Index Scan using test_table_pkey on test_table  (cost=0.43..533078.74 rows=4107370 width=1034) (actual time=38.264..11535.005 rows=3900100 loops=1)"
"        Buffers: shared hit=110813 read=415344"
"Planning time: 0.219 ms"
"Execution time: 15691.660 ms"
Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
  • Do you want random rows? Because there can be faster solutions for that depending on other conditions. – Clodoaldo Neto Oct 29 '14 at 11:08
  • [unrelated] `work_mem = 1024MB` is probably too high, `default_statistics_target = 10000` is way too high for general use. `autovacuum = off` is not needed and dangerous. How long have you been running with autovacuum off? – wildplasser Oct 29 '14 at 11:09
  • @wildplasser I `autovacuum` was always on. I have set this `off` for experiment before running the explain. – Sabuj Hassan Oct 29 '14 at 11:38
  • @ClodoaldoNeto I have always a order by (last_update_date) at the end of my query. For simplicity I removed that from the question. – Sabuj Hassan Oct 29 '14 at 11:39
  • Removing things is not adding simplicity, but adding confusion. Please show the real query. (plus the table definition: is the pk {id, update_date} ? BTW: a similar query runs here in `Total runtime: 12.395 ms` You must have a different table structure than what I can read from your question.. – wildplasser Oct 29 '14 at 11:48
  • @wildplasser I have already added the analyze for `order by id`. My `id` is Primary Key. – Sabuj Hassan Oct 29 '14 at 11:54
  • How sparse are the ids? Can a row be deleted? How often? The `test_table` has no gaps but in production it will happen. The table definition can really help otherwise it is a guessing exercise. – Clodoaldo Neto Nov 25 '14 at 09:53
  • @ClodoaldoNeto rows can be deleted. For example it can happen 6 out of 10 rows having id 1-10 are deleted. That's why I can not use `id >` or `id <` at this moment. – Sabuj Hassan Nov 25 '14 at 10:38
  • What is the typical use case? Will the user go straight to the nth page and leave or will he keep paginating forward? – Clodoaldo Neto Nov 25 '14 at 11:33
  • @SabujHassan what is the size of the table itself (not the database)? Also, do you use some other columns (except created date and ID) in your query? – Hirurg103 Jul 17 '18 at 13:46
  • Does this answer your question? [Improving OFFSET performance in PostgreSQL](https://stackoverflow.com/questions/6618366/improving-offset-performance-in-postgresql) – Mohammed Essehemy Jan 22 '23 at 23:18

10 Answers10

93

It's slow because it needs to locate the top offset rows and scan the next 100. No amounts of optimization will change that when you're dealing with huge offsets.

This is because your query literally instruct the DB engine to visit lots of rows by using offset 3900000 -- that's 3.9M rows. Options to speed this up somewhat aren't many.

Super-fast RAM, SSDs, etc. will help. But you'll only gain by a constant factor in doing so, meaning it's merely kicking the can down the road until you reach a larger enough offset.

Ensuring the table fits in memory, with plenty more to spare will likewise help by a larger constant factor -- except the first time. But this may not be possible with a large enough table or index.

Ensuring you're doing index-only scans will work to an extent. (See velis' answer; it has a lot of merit.) The problem here is that, for all practical purposes, you can think of an index as a table storing a disk location and the indexed fields. (It's more optimized than that, but it's a reasonable first approximation.) With enough rows, you'll still be running into problems with a larger enough offset.

Trying to store and maintain the precise position of the rows is bound to be an expensive approach too.(This is suggested by e.g. benjist.) While technically feasible, it suffers from limitations similar to those that stem from using MPTT with a tree structure: you'll gain significantly on reads but will end up with excessive write times when a node is inserted, updated or removed in such a way that large chunks of the data needs to be updated alongside.

As is hopefully more clear, there isn't any real magic bullet when you're dealing with offsets this large. It's often better to look at alternative approaches.

If you're paginating based on the ID (or a date field, or any other indexable set of fields), a potential trick (used by blogspot, for instance) would be to make your query start at an arbitrary point in the index.

Put another way, instead of:

example.com?page_number=[huge]

Do something like:

example.com?page_following=[id]

That way, you keep a trace of where you are in your index, and the query becomes very fast because it can head straight to the correct starting point without plowing through a gazillion rows:

select * from foo where ID > [id] order by ID limit 100

Naturally, you lose the ability to jump to e.g. page 3000. But give this some honest thought: when was the last time you jumped to a huge page number on a site instead of going straight for its monthly archives or using its search box?

If you're paginating but want to keep the page offset by any means, yet another approach is to forbid the use of larger page number. It's not silly: it's what Google is doing with search results. When running a search query, Google gives you an estimate number of results (you can get a reasonable number using explain), and then will allow you to brows the top few thousand results -- nothing more. Among other things, they do so for performance reasons -- precisely the one you're running into.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • 4
    This should be the fastest solution, however maybe worth mentioning that random access of pages cannot be implemented with it, only a simple *previous* & *next* page (for pagination). – pozs Nov 27 '14 at 15:21
  • @pozs: Indeed. The first paragraph states this quite clearly, I hope. There basically isn't any good option in the general case. The second one starts: "If you're paginating". If OP is in a slightly different scenario, something equivalent might apply to him and this answer will hopefully put him on the right track if there is. If not, it's a lost cause in my experience: no amount of optimization will make reading 4M rows -- in any order -- consistently fast. – Denis de Bernardy Nov 28 '14 at 00:06
  • @Denis thanks. I tried it before. But when user wants to jump into a certain page(i.e. 590 page) that case it is not helping me. – Sabuj Hassan Nov 29 '14 at 07:46
  • 2
    @SabujHassan: For that kind of use-case, there really isn't much you can do... A huge offset will make the DB engine plow through a gazillion rows before starting to output the ones you're looking for, and no amount of wishing, praying or hoping will make the DB engine magically read less rows: you literally *instruct* it to read lots of them. Super-fast hardware, or ensuring you're doing index-only scans, might arguably improve things somewhat; but not materially. Ensuring the table fits in memory, with plenty more to spare, would improve things too, of course. – Denis de Bernardy Nov 29 '14 at 14:42
  • @Denis Can you please update your answer and write a line 'Huge offset jump is slow'. – Sabuj Hassan Dec 03 '14 at 08:20
9

I have upvoted Denis's answer, but will add a suggestion myself, perhaps it can be of some performance benefit for your specific use-case:

Assuming your actual table is not test_table, but some huge compound query, possibly with multiple joins. You could first determine the required starting id:

select id from test_table order by id offset 3900000 limit 1

This should be much faster than original query as it only requires to scan the index vs the entire table. Getting this id then opens up a fast index-search option for full fetch:

select * from test_table where id >= (what I got from previous query) order by id limit 100
velis
  • 8,747
  • 4
  • 44
  • 64
  • Sadly, plowing through 3900001 rows in an index-only scan can be slow too. :-) But it should be twice as fast, give or take, because then only one smaller table (the index) is read instead of two (the index and the actual table). – Denis de Bernardy Dec 01 '14 at 16:36
  • Actually, it should be much faster than only 2x. Even if no info about record count is stored in the index pages, index records are 2 column ones (and int at that), actual tables presumably contain many more columns, often with large varchar fields. – velis Dec 02 '14 at 17:50
  • 1
    This sent me down the right path I needed to optimize my offset query from over 10s down to <200ms on a database of 500000+ rows. The trick is to do the offset+limit as early as possible in your query. Initially, I selected from a view that contained joins and aggregates and offset on that, but the better solution was to inline the view query and change the innermost `from` from `from my_large_table` to `from (select * from my_large_table order by id offset 488000 limit 2000)`. This ensures that all the expensive operations are only done on 2000 elements at a time. – iFreilicht May 04 '23 at 14:20
4

You didn't say if your data is mainly read-only or updated often. If you can manage to create your table at one time, and only update it every now and then (say every few minutes) your problem will be easy to solve:

  • Add a new column "offset_id"
  • For your complete data set ordered by ID, create an offset_id simply by incrementing numbers: 1,2,3,4...
  • Instead of "offset ... limit 100" use "where offset_id >= 3900000 limit 100"
benjist
  • 2,740
  • 3
  • 31
  • 58
  • Do note that when you delete a row, you'll also need to re-compute the `offset_id` for every subsequent rows. Depending on the size of table and how frequently rows get deleted, this can mean a lot of table writes. – Denis de Bernardy Dec 01 '14 at 16:43
  • Sure. That's why I said it must always be done "for your complete data set" - if this is feasible at all for his use case. He didn't say. – benjist Dec 01 '14 at 23:27
2

you can optimise in two steps

First get maximum id out of 3900000 records

select max(id) (select id from test_table order by id limit 3900000);

Then use this maximum id to get the next 100 records.

select * from test_table id > {max id from previous step) order by id limit 100 ;

It will be faster as both queries will do index scan by id.

1

This way you get the rows in semi-random order. You are not ordering the results in a query, so as a result, you get the data as it is stored in the files. The problem is that when you update the rows, the order of them can change.

To fix that you should add order by to the query. This way the query will return the rows in the same order. What's more then it will be able to use an index to speed the query up.

So two things: add an index, add order by to the query. Both to the same column. If you want to use the id column, then don't add index, just change the query to something like:

select * from test_table order by id offset 3900000 limit 100;
Szymon Lipiński
  • 27,098
  • 17
  • 75
  • 77
  • Thanks. But its not working I believe. I have `id` is `PK` and tried `order by` with it. Not working. I have updated my question. Also tried with another indexed column `last_update_date` and result is same. – Sabuj Hassan Oct 29 '14 at 08:41
  • Returning an ordered result will obviously make it slower although it makes sense for paging, if that is what the OP wants. – Clodoaldo Neto Oct 29 '14 at 11:07
0

I don't know all of the details of your data, but 4 million rows can be a little hefty. If there's a reasonable way to shard the table and essentially break it up into smaller tables it could be beneficial.

To explain this, let me use an example. let's say that I have a database where I have a table called survey_answer, and it's getting very large and very slow. Now let's say that these survey answers all come from a distinct group of clients (and I also have a client table keeping track of these clients). Then something I could do is I could make it so that I have a table called survey_answer that doesn't have any data in it, but is a parent table, and it has a bunch of child tables that actually contain the data the follow the naming format survey_answer_<clientid>, meaning that I'd have child tables survey_answer_1, survey_answer_2, etc., one for each client. Then when I needed to select data for that client, I'd use that table. If I needed to select data across all clients, I can select from the parent survey_answer table, but it will be slow. But for getting data for an individual client, which is what I mostly do, then it would be fast.

This is one example of how to break up data, and there are many others. Another example would be if my survey_answer table didn't break up easily by client, but instead I know that I'm typically only accessing data over a year period of time at once, then I could potentially make child tables based off of year, such as survey_answer_2014, survey_answer_2013, etc. Then if I know that I won't access more than a year at a time, I only really need to access maybe two of my child tables to get all the data I need.

In your case, all I've been given is perhaps the id. We can break it up by that as well (though perhaps not as ideal). Let's say that we break it up so that there's only about 1000000 rows per table. So our child tables would be test_table_0000001_1000000, test_table_1000001_2000000, test_table_2000001_3000000, test_table_3000001_4000000, etc. So instead of passing in an offset of 3900000, you'd do a little math first and determine that the table that you want is table test_table_3000001_4000000 with an offset of 900000 instead. So something like:

SELECT * FROM test_table_3000001_4000000 ORDER BY id OFFSET 900000 LIMIT 100;

Now if sharding the table is out of the question, you might be able to use partial indexes to do something similar, but again, I'd recommend sharding first. Learn more about partial indexes here.

I hope that helps. (Also, I agree with Szymon Guz that you want an ORDER BY).

Edit: Note that if you need to delete rows or selectively exclude rows before getting your result of 100, then sharding by id will become very hard to deal with (as pointed out by Denis; and sharding by id is not great to begin with). But if your 'just' paginating the data, and you only insert or edit (not a common thing, but it does happen; logs come to mind), then sharding by id can be done reasonably (though I'd still choose something else to shard on).

Trevor Young
  • 545
  • 4
  • 12
  • This isn't a valid approach, because if rows get deleted there's no guarantee you'll end up returning the 3900000th+ rows. – Denis de Bernardy Dec 01 '14 at 16:41
  • Depends how the database is being managed. There are plenty of instances out there where data is never deleted (so that history can be kept), but rather it is 'inactivated' by changing a value in a column. I also stated that sharding by id is not ideal. Shard by more useful data if available. I just worked off of what little info was given. – Trevor Young Dec 01 '14 at 18:40
  • Sure, if you're trying to paginate the 3900000th+ live rows, there's still no guarantee it'll be in any particular partition either. If anything, it makes things worse, because you then need to read each row or have a larger index in order to ignore non-live rows. – Denis de Bernardy Dec 01 '14 at 18:44
  • Would you like me to remove the part about sharding on id? Sharding in general is still very valid to this subject, and especially so if you can use relevant data, such as from my first two examples (by client id and by year). Sharding by id I stated as a poor choice from the get go, but if there's nothing else there's ways of making it work (and it probably would be more complex than what I've put, but it gives a start). But it's best to find actual logical groupings. – Trevor Young Dec 01 '14 at 19:47
  • I guess what I'm getting at with the last comment is until I know why we're ordering by id, a good solution can't be put out. typically an id is just a unique identifier (which can sometimes be attached to a sequence and as such give some idea of insertion order, if you're lucky) and as such it usually isn't best to use it for ordering. Instead something like date or name or rating or the like should be used for order, and as such gives something better to shard on. I can only go so far with the contrived example (which only gives me an id). – Trevor Young Dec 01 '14 at 20:00
  • Yeah, but the thing is ... 4M rows in a database table is actually just a few notches above tiny. The problem is that no amount of optimization will make reading 3.9M rows fast, be it index entries or rows within the table itself; as things stand, OP's query *requires* reading that number of rows. – Denis de Bernardy Dec 01 '14 at 20:00
  • 'Tiny' is defined by the hardware the query is being run on. 4M rows is tiny on some hardware and much bigger on other hardware. And the OP's question only requires that 100 rows be read (limit 100). It's just a matter of how many more rows in addition to the 100 you have to read to insure that you get the right 100. – Trevor Young Dec 01 '14 at 20:07
  • Err... :-) In the DB world, think of 100kB = XS, 1MB = S, 1GB = M, 1TB = L, 1PB = XL. 4M rows really is just a few notches above tiny -- regardless of hardware. OP's question requires 3.9M rows to be read *because OFFSET 3900000 LIMIT 100 tells the DB to place itself at the 3.9M-th row (as measured by finding 3.9M applicable rows) before returning the next 100*. – Denis de Bernardy Dec 01 '14 at 20:19
  • Your note on DB sizes is all good and accurate, the point I was making is that it's going slower than what he wants, so the table size is an issue on the hardware that he's using, at least (upgrading hardware would make it go faster, but I don't think he's after that solution here). The other point that I'm making is that you only have to read 3.9M rows if there are 3.9M rows before the 100 that you want. If you can intelligently break up the data, then you don't have to read 3.9M rows, just however many rows are in the shard that you've intelligently deduced the data to be in. – Trevor Young Dec 01 '14 at 20:41
  • Id isn't ideal for this and I've added a note to my answer for this. I'll leave off on the subject because we've got a lot of comments now. – Trevor Young Dec 01 '14 at 20:42
  • To be able to head to a particular partition knowing you'll hit precisely the Nth row in a set with certainty, you need to create -- and much more importantly, maintain/flush/regenerate -- appropriate indexes, as suggested by @benjist. But it's not at all simple, and most certainly not worthwhile when the DB goes to size S or larger. The best is to simply avoid this type of query. Google limits search results to the top few thousands, and dodges pagination altogether on blogger (using a date anchor) for very, very good reasons. – Denis de Bernardy Dec 01 '14 at 20:59
0

First, you have to define limit and offset with order by clause or you will get inconsistent result.

To speed up the query, you can have a computed index, but only for these condition :

  1. Newly inserted data is strictly in id order
  2. No delete nor update on column id

Here's how You can do it :

  1. Create a row position function

create or replace function id_pos (id) returns bigint as 'select count(id) from test_table where id <= $1;' language sql immutable;

  1. Create a computed index on id_pos function

create index table_by_pos on test_table using btree(id_pos(id));

Here's how You call it (offset 3900000 limit 100):

select * from test_table where id_pos(id) >= 3900000 and sales_pos(day) < 3900100;

This way, the query will not compute the 3900000 offset data, but only will compute the 100 data, making it much faster.

Please note the 2 conditions where this approach can take place, or the position will change.

Soni Harriz
  • 3,378
  • 1
  • 12
  • 8
  • This isn't a valid approach, nor a valid immutable function, because deleting a row somewhere in the table will mean needing to rebuild the index for all subsequent rows. – Denis de Bernardy Dec 01 '14 at 16:40
  • I am using something like this to send data in parallel to a cloud based visualization tool which only allows Append or Replace data sets. I rebuild the row_number in a materialized view once per day, then send chunks of rows where ID > previous_id LIMIT 200,000. I used date ranges before, but I ran into data which had extremely sparse data for a period of time, and then periods where there were far too many rows to send per query. I need to send ALL data in the table, but ensure that I do not send too many rows at once. – trench Jul 16 '18 at 11:48
0

How about if paginate based on IDs instead of offset/limit?

The following query will give IDs which split all the records into chunks of size per_page. It doesn't depend on were records deleted or not

SELECT id AS from_id FROM (
  SELECT id, (ROW_NUMBER() OVER(ORDER BY id DESC)) AS num FROM test_table
) AS rn
WHERE num % (per_page + 1) = 0;

With these from_IDs you can add links to the page. Iterate over :from_ids with index and add the following link to the page:

<a href="/test_records?from_id=:from_id">:from_id_index</a>

When user visits the page retrieve records with ID which is greater than requested :from_id:

SELECT * FROM test_table WHERE ID >= :from_id ORDER BY id DESC LIMIT :per_page

For the first page link with from_id=0 will work

<a href="/test_records?from_id=0">1</a>
Hirurg103
  • 4,783
  • 2
  • 34
  • 50
0

To avoid slow pagination with big tables always use auto-increment primary key then use the query below:

SELECT * FROM test_table WHERE id > (SELECT min(id) FROM test_table WHERE id > ((1 * 10) - 10)) ORDER BY id DESC LIMIT 10

1: is the page number
10: is the records per page

Tested and work well with 50 millions records.

Yassine Khachlek
  • 1,134
  • 12
  • 17
  • Auto incremented columns are not guaranteed to increment by 1. You can skip increments manually, by deleted rows, or by using sequence caching. – Pål Thingbø Oct 13 '22 at 17:35
0

There are two simple approaches to solve such a problem

  • Splitting the query into two subqueries that the first one do all the heavy job on index-only scan as described here
  • Create calculated index that holds the offset as described here, this can be enhanced using window functions.
Mohammed Essehemy
  • 2,006
  • 1
  • 16
  • 20