0

In the book High performance mysql 3rd edition, it says

A frequent problem is having a high value for the offset. If your query looks like LIMIT 10000, 20, it is generating 10,020 rows and throwing away the first 10,000 of them, which is very expensive.

One simple technique to improve efficiency is to do the offset on a covering index, rather than the full rows.

Consider the following query:

SELECT film_id, description FROM sakila.film ORDER BY title LIMIT 50, 5;

If the table is very large, this query is better written as follows:

mysql> SELECT film.film_id, film.description
-> FROM sakila.film
-> INNER JOIN (
-> SELECT film_id FROM sakila.film
-> ORDER BY title LIMIT 50, 5
-> ) AS lim USING(film_id);

This “deferred join” works because it lets the server examine as little data as possible in an index without accessing rows, and then, once the desired rows are found, join them against the full table to retrieve the other columns from the row.

My question is, does the second SQL preserve the same result set order as the first SQL, since it has no order by clause outside?

Here is the table DDL for quick reference:

CREATE TABLE film (
  film_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  title VARCHAR(128) NOT NULL,
  description TEXT DEFAULT NULL,
  release_year YEAR DEFAULT NULL,
  language_id TINYINT UNSIGNED NOT NULL,
  original_language_id TINYINT UNSIGNED DEFAULT NULL,
  rental_duration TINYINT UNSIGNED NOT NULL DEFAULT 3,
  rental_rate DECIMAL(4,2) NOT NULL DEFAULT 4.99,
  length SMALLINT UNSIGNED DEFAULT NULL,
  replacement_cost DECIMAL(5,2) NOT NULL DEFAULT 19.99,
  rating ENUM('G','PG','PG-13','R','NC-17') DEFAULT 'G',
  special_features SET('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (film_id),
  KEY idx_title (title),
  KEY idx_fk_language_id (language_id),
  KEY idx_fk_original_language_id (original_language_id),
  CONSTRAINT fk_film_language FOREIGN KEY (language_id) REFERENCES language (language_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_film_language_original FOREIGN KEY (original_language_id) REFERENCES language (language_id) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Chen Li
  • 320
  • 1
  • 10
  • I'm not really sure this is needed in the first place. If the `ORDER BY` column is indexed, the index will be used in the original query. – Barmar Feb 10 '23 at 01:51
  • For that specific query, the motive is not clear. So I add some context. it does improve performance in this case and some others cases. – Chen Li Feb 10 '23 at 02:03
  • Where does the book quote end? Please put the whole quote in a block quote (see the `"` button in the editor). – Kelly Bundy Feb 10 '23 at 08:00
  • You _must_ add another `ORDER BY` on the outside if you want it. – Rick James Feb 10 '23 at 19:24

1 Answers1

1

In my experience the result will preserve the ordering of the subquery, but I'm not sure it's guaranteed. So you should repeat the ORDER BY clause.

SELECT film.film_id, film.description
FROM sakila.film
INNER JOIN (
    SELECT film_id 
    FROM sakila.film
    ORDER BY title 
    LIMIT 50, 5
) AS lim USING(film_id)
ORDER BY title
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks for your reply! By repeating the ORDER BY clause, the explain will show "use filesort". I doubt if "filesort" in this case is a big deal? What if it's not "LIMIT 50, 5" , but "LIMIT 2000, 5"? By asking this quesiton, I want to know if I can omit the repeating ORDER BY clause. – Chen Li Feb 10 '23 at 02:07
  • I just found [another question](https://stackoverflow.com/questions/26372511/mysql-mariadb-order-by-inside-subquery) where the answer says that ORDER BY in the subquery is applied when it also uses `LIMIT`. I guess it's intended to support this precise use. – Barmar Feb 10 '23 at 15:41