2

Given

  • The following schema:

    CREATE TABLE employees (
        name CHAR,
        PRIMARY KEY id INT
    );
    
  • The table is sorted on id

  • There are 100 unique ids from 1-100 in the table.


Example

| name    | id |
|---------|----|
| Lynne   | 1  |
| Johnny  | 2  |
| D'Andra | 3  |
| Kimmel  | 4  |
|        ...   |

Objective

Get 10 people with ids greater than or equal to 3.


Question

Would it be faster to use select name from employees order by id limit 10 offset 3 or select name from employees where id >= 3 and id <13 order by id, and why?


What I've checked out so far

Does adding 'LIMIT 1' to MySQL queries make them faster when you know there will only be 1 result?: This says that using limit is faster than not using limit but it doesn't compare it to range queries.

Select query with offset limit is too much slow: This says that offset is often slow because it needs to go through all the rows to get to the offset. It doesn't discuss whether it is slower to use offset x than id >= x for any integer x?

oamandawi
  • 405
  • 5
  • 15
  • 3
    I think both SQL queries should have an explicit `ORDER BY` clause. It won't change anything but I question your assertion that "the table is sorted on id." Relations are sets with no defined order. What you see is an artifact of `id` being the primary key and MySql has decided to retrieve the rows by default in primary key order. Tomorrow it could do something different. – Booboo Nov 02 '20 at 18:45
  • Done, thanks @Booboo – oamandawi Nov 02 '20 at 18:46
  • 1
    don't assume there are no gaps in ids from 3 to 12 – ysth Nov 02 '20 at 18:48
  • 2
    I think you could test this for yourself. Then you'd just be left with the question of Why? – Strawberry Nov 02 '20 at 18:48
  • @Strawberry, thanks for your comment; I have been testing it out here: http://sqlfiddle.com/#!9/47da9e/3. Any tips for how to conduct a more thorough test? The range query is faster in this example, but I'm not sure how test this at scale. – oamandawi Nov 02 '20 at 18:53
  • @ysh, thanks for your comment; I'm not assuming it, it is a given in my case. – oamandawi Nov 02 '20 at 18:54

2 Answers2

5

Your two queries are not the same. They are the same only when you know that the id column has no gaps and (and no duplicates, but that makes sense for an id).

For small offsets, there shouldn't be a difference for such a simple query. I do think, though, that MySQL will read all results for the offset query and then start returning results when it hits the offset number. That is, it actually counts the rows and then outputs the one after the offset.

The where clause should cause MySQL to go directly to the right record in the index. That should be faster for larger result sets.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for your answer. I understand your explanation, and it makes sense to me why the `offset` would be slower than the `where` clause, now. I won't accept this answer for a bit longer to see if others have anything else to add. – oamandawi Nov 02 '20 at 19:59
  • 1
    @oamandawi for how to handle larger offsets, see https://explainextended.com/2011/02/11/late-row-lookups-innodb/ – Strawberry Nov 02 '20 at 20:18
4

Maybe neither of those is optimal.

A common thing to do in web pages is to "paginate", wherein the first 'page' shows the 'first' 10 items, the second page shows the next 10, etc.

Using OFFSET is terribly inefficient when you get farther and farther into the list -- it must gingerly step over each of "offset" rows before seeing the 10.

If id an AUTOINCREMENT, there is no guarantee that the ids will be consecutive over time. Deletes, REPLACE, INSERT IGNORE, replication in a cluster, etc, can leave gaps. Sure, the number works nicely today. But you should not trust tomorrow after someone "fixes" something in the data.

The optimal method for paginating is "remembering where you left off". No OFFSET is used. id (or some other unique column(s)) is merely a place-holder, not a number.

More details: http://mysql.rjweb.org/doc.php/pagination

OP's Question

Would it be faster to use select name from employees order by id limit 10 offset 3 or select name from employees where id >= 3 and id <13 order by id, and why?

(Let me modify the numbers to make the Answer more obvious:)

  • limit 10 offset 300 -- The processing must touch 300 rows before getting to the desired 10; that's 310 rows touched.
  • where id >= 300 and id < 310 -- Assuming there is an index on id (probably the PRIMARY KEY), only 10 rows need to be touched.
  • As I point out in my link, this avoids an unmentioned issue: What if the row with id=305 were deleted? The range approach would get only 9 rows. So... WHERE id >=300 ORDER BY id LIMIT 10 gets you exactly 10 and cannot be fooled by missing ids.
  • Even better: Use LIMIT 11. This wastes a little in that it grabs an extra row. But it lets you know whether to include a [Next] button on the page. If you get 11 rows back, there is a "next" page. If <=10 rows, there is not. This is a small price to make your UI more user friendly; you want that, don't you?
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Makes sense although digresses a bit from the question. Very useful info, here, nonetheless. Thank you – oamandawi Nov 03 '20 at 19:42
  • 1
    @oamandawi - I added a long paragraph to specifically address the OP's "Question". And I took it a step further. – Rick James Mar 31 '21 at 17:47
  • Thank you so much! I didn't think of using where with a limit. That's very smart. I won't change the answer because I believe the conclusion between the two original queries is still the same. – oamandawi Apr 11 '21 at 03:23
  • I read the link above. Fascinating. Q: does this solution work with tables with a UUID as the primary key? Is that where a date ORDER BY comes in? – stonedauwg Feb 16 '23 at 19:31
  • @stonedauwg - `PRIMARY KEY(uuid)` means that the data is scattered around. If you use 1 and rearrange the bits ([_UUIDs_](http://mysql.rjweb.org/doc.php/uuid), then, for example, all of "today's" rows will be clumped in one part of the BTree that holds the data -- this clustering helps with performance, especially for _huge_ tables. But... – Rick James Feb 16 '23 at 21:21
  • For the pagination technique to work, you need an index that mirrors the `ORDER BY` adequately. It makes no sense to do `ORDER BY uuid`. However, you probably have a secondary index that might work. Please provide (preferably in a separate Question) `SHOW CREATE TABLE` and some version of the `SELECT` that you want to paginate. – Rick James Feb 16 '23 at 21:24