27

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:

SELECT * FROM table LIMIT m, n;

If the offset m is, say, larger than 1,000,000, the operation is very slow.

I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.

How can I optimize this statement for better performance?

secondtruth
  • 89
  • 1
  • 9
ZA.
  • 10,107
  • 10
  • 37
  • 39
  • See also https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down – caw Jan 18 '19 at 00:06

6 Answers6

13

Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.

#create table to store sequences
CREATE TABLE seq (
   seq_no int not null auto_increment,
   id int not null,
   primary key(seq_no),
   unique(id)
);

#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;

#now get 1000 rows from offset 1000000
SELECT mytable.* 
FROM mytable 
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
  • 6
    this approach only works in select statements that don't contain where condition. in my opinion it is not a good solution. – Behrouz.M Jun 22 '11 at 06:37
  • 5
    How to keep this index table updated? In my case, I have to order by datetime column and use large offsets resulting in slow queries. If I create this suport table, I will need to reinsert each time I have a new date, since it not come in order. I already see this solution, but with temporary tables. – Keyne Viana Aug 19 '11 at 22:31
  • 1
    If I'm reading this correctly, you are just duplicating the id column from mytable into another table (and you would have to keep both tables updated). Could you not just Join on itself where you only select the id? This is what most workarounds do that I've seen. – Gremio Jul 28 '18 at 17:52
10

If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)

slow:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

fast:

SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000

SELECT * FROM table WHERE id IN (1,2,3...10)
Scott Nelson
  • 839
  • 9
  • 8
  • This is actually the best answer here and does what Jeff Atwood’s blog post linked in [the other answer](https://stackoverflow.com/a/1244053/89818) describes. – caw Jan 18 '19 at 00:03
9

There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.

Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):

select A.* from table A 
  inner join (select id from table order by whatever limit m, n) B
  on A.id = B.id
order by A.whatever

If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.

Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.

bart
  • 7,640
  • 3
  • 33
  • 40
  • I tested your SQL suggested . but it does not do any improvement. – Behrouz.M Jun 22 '11 at 06:35
  • 1
    What if you have a where clause based on table A? It will not work, since it first limit, then apply the where clause. If you use join in the inside of your subquery, you will loose performance, right? – Keyne Viana Aug 19 '11 at 22:36
  • 1
    It worked for me, `SELECT id FROM ...` query was executed about 50 times faster on a set of almost a million rows compared to `SELECT bunch,of,fields FROM ...`. – mr.b Aug 28 '12 at 23:17
  • 1
    Thanks for the pointer to Atwood's article; that's interesting reading. But it doesn't recommend always doing what you say it does; rather, it uses this technique as an example of something that works *here*. I'd argue that the whole premise of the article is that databases are complex beasts, and no one solution can suit all cases (hence abstractions are inevitably "leaky"). – alexis Nov 16 '15 at 09:49
3

I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:

SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;

Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
SlappyTheFish
  • 2,344
  • 3
  • 34
  • 41
  • Is just mysql or mosts dbs acts in this odd way? So far, the best solution is the subquery (when you do not have an ordered index). Query and order all first, then put the offset. – Keyne Viana Aug 19 '11 at 22:50
  • The idea of using just the ID may be a very good solution indeed, it depends on the storage engine I suppose! – twicejr Aug 14 '14 at 17:13
2

Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.

If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.

SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;

and so on...

0

I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:

$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId}  ORDER BY title ) as t");  

Then I could use that as the from part of my query:

'titles.id',
                            'title_eisbns_concat.eisbns_concat', 
                            'titles.pub_symbol', 
                            'titles.title', 
                            'titles.subtitle', 
                            'titles.contributor1', 
                            'titles.publisher', 
                            'titles.epub_date', 
                            'titles.ebook_price', 
                            'publisher_licenses.id as pub_license_id', 
                            'license_types.shortname',
                            $coversQuery
                        )
                        ->from($subQuery)
                        ->leftJoin('titles',  't.id',  '=', 'titles.id')
                        ->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol') 
                        ->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id') 
                        ->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id') 
                        ->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')

The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.

PhPGuy
  • 41
  • 3
  • This is very similar to many other solutions where you know before hand what ID you want to start at for limiting (there are probably more elegant ways to do this). The main problem with that is when you need to display pages in the middle, and you have no idea what ID to start at (determined by the where clause). My guess is the Order by title in your sub query is not indexed. You can try using explain to figure out what's going on, and creating new indices. Trying to index text can be problematic. https://dba.stackexchange.com/questions/35821/possible-index-on-a-varchar-field-in-mysql – Gremio Jul 28 '18 at 18:04