225

Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY *primary_key*

So

SELECT * FROM large ORDER BY `id`  LIMIT 0, 30 

takes far less than

SELECT * FROM large ORDER BY `id` LIMIT 10000, 30 

That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
Rahman
  • 2,253
  • 3
  • 14
  • 6
  • NOTE: I'm the author. MySQL doesn't refer to the index (PRIMARY) in the above cases. see the below link by user "Quassnoi" for explanation. – Rahman Dec 27 '10 at 15:18
  • possible duplicate of [How can I speed up a MySQL query with a large offset in the LIMIT clause?](http://stackoverflow.com/questions/1243952/how-can-i-speed-up-a-mysql-query-with-a-large-offset-in-the-limit-clause) – Lightness Races in Orbit Dec 11 '11 at 20:48
  • 2
    A related link: [We need tool support for keyset pagination](https://use-the-index-luke.com/no-offset). If you’d like to know what happens inside the database when using offset or keyset pagination, have a look at those slides. – Jason Law Jan 22 '21 at 10:47

6 Answers6

286

I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.

So what you can do instead is:

  1. Hold the last id of a set of data(30) (e.g. lastId = 530)
  2. Add the condition WHERE id > lastId limit 0,30

So you can always have a ZERO offset. You will be amazed by the performance improvement.

Elzo Valugi
  • 27,240
  • 15
  • 95
  • 114
Nikos Kyr
  • 3,140
  • 1
  • 13
  • 16
  • Does this work if there are gaps? What if you don't have a single unique key (a composite key for example)? – xaisoft Aug 08 '13 at 17:51
  • 10
    It may not be obvious to all that this only works if your result set is sorted by that key, in ascending order (for descending order the same idea works, but change > lastid to < lastid.) It doesn't matter if it's the primary key, or another field (or group of fields.) – Eloff Sep 16 '13 at 14:14
  • Well done that man! A very simple solution that has solved my problem :-) – oodavid Dec 24 '13 at 14:51
  • 49
    Just a note that limit/offset is often used in paginated results, and holding lastId is simply not possibly because the user can jump to any page, not always the next page. In other words, offset often needs to be calculated dynamically based on page and limit, instead of following a continuous pattern. – Tom Dec 28 '13 at 14:23
  • 1
    If it is just pagination and sorting with id ONLY with a simple where clause if you want you can use this: `select id,name from large_table where id>=(SELECT id FROM large_table where name like '%the%' limit 1 offset 2000001) and name like '%the%' limit 50;` It is fast.. – Abdul Rahman Mohsen Feb 10 '16 at 07:21
  • 3
    I talk at more length about "remembering where you left off" in http://mysql.rjweb.org/doc.php/pagination – Rick James Jan 24 '17 at 23:14
  • Although completely unprofessional, yet I have to admit it: Love ya! – Sanosay Apr 03 '18 at 11:19
  • What about there's no lastId ? – shaoyihe Jul 11 '18 at 09:07
  • How do you remember your last position when the user visiting Page 563 from bookmark? Seems like there's no alternative for limit / offset in this case, which is sad. I have a 6 million row sample data and the pagination is pain, if you cannot implement unlimited scrolling. – Lanti Jul 12 '18 at 19:49
  • 4
    man. you are a live saver. i have 5 mil data that need around 90 mins to process all with offset and limit now when i tried your answer. daamn its only need 9 mins to process Thankyou man. THANK YOU!! – Gery Ruslandi Mar 12 '21 at 08:26
  • 2
    @Lanti Let's assume that Page 563 begins at offset 563 * 30 = 16890, since in the OP's example 30 is the page size and assume page numbering starts from 0. Further assume that column `id` is unique and is indexed. Then execute `select id from large order by id limit 16889, 1` to read the id of the last row of Page 562. This should be reasonably efficient since only the index is involved. Now you have the "lastId" to proceed with selecting the next page. – Booboo Apr 09 '22 at 22:24
229

It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them). The higher is this value, the longer the query runs.

The query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

Assuming that id is the primary key of a MyISAM table, or a unique non-primary key field on an InnoDB table, you can speed it up by using this trick:

SELECT  t.* 
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT 10000, 30
        ) q
JOIN    mytable t
ON      t.id = q.id

See this article:

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • 9
    MySQL "early row lookup" behavior was the answer why it's talking so long. By the trick you provided, only matched ids (by the index directly) are bound, saving unneeded row lookups of too many records. That did the trick, hooray! – Rahman Dec 27 '10 at 15:08
  • 1
    Awesome ... are there any limitations, where this trick will not work? – aurora Nov 24 '11 at 17:08
  • 4
    @harald: what exactly do you mean by "not work"? This is a pure performance improvement. If there is no index usable by `ORDER BY` or the index covers all fields you need, you don't need this workaround. – Quassnoi Nov 24 '11 at 18:13
  • @Quassnoi: i don't know. i played around with this on some own tables with millions of rows where i had performance problems before and the solution you provided works like a charm for. i guess i have to delve a little deeper in what's going on here to fully understand this solution. thanks! – aurora Nov 24 '11 at 20:58
  • 1
    From my test this solution has its limits. For a table with 16M rows, this method is getting slow once offset is around half the table, ie. LIMIT 7100000,100000 – f055 Aug 07 '12 at 14:38
  • 9
    @f055: the answer says "speed up", not "make instant". Have you read the very first sentence of the answer? – Quassnoi Aug 07 '12 at 17:41
  • Is this approach applicable to PostgreSQL? How it compares to using a server-side cursor in terms of performance? I'm talking about data volume of around 2M records per table, for some tables. Inspiration for this comment, if you're curious, is my question on SO: http://stackoverflow.com/questions/24264817/handling-paginated-sql-query-results. – Aleksandr Blekh Jun 19 '14 at 05:46
  • 3
    Is it possible to run something like this for InnoDB? – Tom Raganowicz May 27 '15 at 15:17
  • 1
    @NeverEndingQueue: in InnoDB, tables are clustered on primary key, so it's useless if you order by the primary key. If your order by a secondary index key, then yes, it would make sense. – Quassnoi May 27 '15 at 16:33
  • 2
    @aurora - The trick will slow down once you have to scan so much of the index on `id` that the 'trick' becomes slow. The real 'fix' is to "remember where you left off". – Rick James Jul 06 '15 at 21:05
  • So, does MySQL re-use position, it gained on previous call to `LIMIT` if calls go sequentially? – Dims Jun 13 '16 at 11:38
  • 1
    @Dims: no it doesn't. – Quassnoi Jun 13 '16 at 12:24
  • How to implement this with multiple WHERE statements, like: `WHERE categories &> '{news}' AND title ILIKE ALL (ARRAY['%hello%'])`? Seems like this "hack" doesn't work inside a sub-query, execution time as long as without sub-query with primary key column. Of course, all of those columns are indexed. This is needed for category listing and full-text search. – Lanti Aug 05 '18 at 13:54
  • 3
    @Lanti: please post it as a separate question and don't forget to tag it with `postgresql`. This is a MySQL-specific answer. – Quassnoi Aug 05 '18 at 14:33
  • So this does what you would do manually if you only fetched the columns covered by a single key first (here: IDs only) and then used a second query to fetch both the IDs and some other data for these records. – caw Jan 18 '19 at 00:12
  • 1
    Hi! I made **a comparison (with a plot)** here: https://stackoverflow.com/a/60472885/4619958 :) – ch271828n Mar 01 '20 at 07:23
  • I'm using Innodb's primary key `id` field as order by condition, but I'm still seeing a significant performance increase(780ms -> 13.7ms). AFAIK, Innodb is using cluster index, why is the performance still got increased? – ospider Jan 11 '21 at 02:22
  • @ospider: post your setup and some sample data in a separate question and leave the link in a comment here – Quassnoi Jan 11 '21 at 03:56
21

MySQL cannot go directly to the 10000th record (or the 80000th byte as your suggesting) because it cannot assume that it's packed/ordered like that (or that it has continuous values in 1 to 10000). Although it might be that way in actuality, MySQL cannot assume that there are no holes/gaps/deleted ids.

So, as bobs noted, MySQL will have to fetch 10000 rows (or traverse through 10000th entries of the index on id) before finding the 30 to return.

EDIT : To illustrate my point

Note that although

SELECT * FROM large ORDER BY id LIMIT 10000, 30 

would be slow(er),

SELECT * FROM large WHERE id >  10000 ORDER BY id LIMIT 30 

would be fast(er), and would return the same results provided that there are no missing ids (i.e. gaps).

Riedsio
  • 9,758
  • 1
  • 24
  • 33
  • 2
    This is correct. But since it's limited by "id", why does it take so long when that id is within an index (primary key)? Optimizer should refer to that index directly, and then fetch the rows with matched ids ( which came from that index) – Rahman Dec 27 '10 at 15:15
  • 1
    If you used a WHERE clause on id, it could go right to that mark. However, if you put a limit on it, ordered by id, it's just a relative counter to the beginning, so it has to transverse the whole way. – Riedsio Dec 27 '10 at 18:11
  • Very good article https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/ – Pažout Aug 20 '18 at 07:39
  • Worked for me @Riedsio Thanks. – mahesh kajale Oct 09 '18 at 12:17
  • This answer is wrong if there are other conditions in WHERE clause. – charlesdk Aug 16 '22 at 10:31
12

I found an interesting example to optimize SELECT queries ORDER BY id LIMIT X,Y. I have 35million of rows so it took like 2 minutes to find a range of rows.

Here is the trick :

select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 1000;

Just put the WHERE with the last id you got increase a lot the performance. For me it was from 2minutes to 1 second :)

Other interesting tricks here : http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

It works too with strings

sym
  • 357
  • 4
  • 7
  • 2
    this works only for tables, where no data are deleted – miro Apr 18 '16 at 19:08
  • 2
    @miro That's only true if you are working under the assumption that your query can do lookups at random pages, which I don't believe this poster is assuming. While I don't like this method for most real world cases, this will work with gaps as long as you are always basing it off the last id obtained. – Gremio Jul 28 '18 at 17:44
5

The time-consuming part of the two queries is retrieving the rows from the table. Logically speaking, in the LIMIT 0, 30 version, only 30 rows need to be retrieved. In the LIMIT 10000, 30 version, 10000 rows are evaluated and 30 rows are returned. There can be some optimization can be done my the data-reading process, but consider the following:

What if you had a WHERE clause in the queries? The engine must return all rows that qualify, and then sort the data, and finally get the 30 rows.

Also consider the case where rows are not processed in the ORDER BY sequence. All qualifying rows must be sorted to determine which rows to return.

bobs
  • 21,844
  • 12
  • 67
  • 78
  • 1
    just wondering why it consumes time to fetch those 10000 rows. The index used on that field ( id, which is a primary key ) should make retrieving those rows as fast as seeking that PK index for record no. 10000, which in turn is supposed to be fast as seeking the file to that offset multiplied by index record length, ( ie, seeking 10000*8 = byte no 80000 - given that 8 is the index record length ) – Rahman Dec 20 '10 at 20:00
  • @Rahman - The only way to count past the 10000 rows is to step over them one by one. This _may_ just involve an index, but still index rows take time to step through. There is _no_ MyISAM or InnoDB structure that can correctly (in all cases) "seek" to record 10000. The 10000*8 suggestion assumes (1) MyISAM, (2) FIXED length record, and (3) never any deletes from the table. Anyway, MyISAM indexes are BTrees, so it would not work. – Rick James Jan 24 '17 at 23:12
  • As this answer stated, I believe, the really slow part is the row lookup, not traversing the indexes (which of course will add up as well, but nowhere near as much as the row lookups on disk). Based on the workaround queries provided for this issue, I believe the row lookups tend to happen if you are selecting columns outside of the index -- even if they are not part of the order by or where clause. I haven't found a reason why this is necessary, but it appears to be why some of the workarounds help. – Gremio Jul 28 '18 at 17:38
  • I believe the delay is caused by counting the entries in the index tree, as oposed to finding the starting index (for which SQL index tree is optimized and it gets pointed close to the target row, without going through particular rows). The next part, reading number of rows, is equaly "slow" when using `WHERE ID > x`. But the latter is useless in most real world applications anyway. – Oak_3260548 Jul 30 '20 at 07:25
4

For those who are interested in a comparison and figures :)

Experiment 1: The dataset contains about 100 million rows. Each row contains several BIGINT, TINYINT, as well as two TEXT fields (deliberately) containing about 1k chars.

  • Blue := SELECT * FROM post ORDER BY id LIMIT {offset}, 5
  • Orange := @Quassnoi's method. SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
  • Of course, the third method, ... WHERE id>xxx LIMIT 0,5, does not appear here since it should be constant time.

Experiment 2: Similar thing, except that one row only has 3 BIGINTs.

  • green := the blue before
  • red := the orange before

enter image description here

ch271828n
  • 15,854
  • 5
  • 53
  • 88