2

What does MySQL perform first: The WHERE clause or the ORDER BY clause?

The reason I ask is to determine whether I should add an index to a given column.

I have a table such as the following:

| Column    | Type        | Index | 
|-----------|-------------|-------| 
| id        | INT (pk)    | Yes   | 
| listorder | INT         | ??    | 
| data      | VARCHAR(16) | No    | 
| fk        | INT (fk)    | Yes   | 

I will often execute queries such as SELECT id, data FROM mytable WHERE fk=12345 ORDER BY listorder ASC. For my data set, it will only result in a small number of records (~5) for a given fk, however, there are many records in the table with many fk values, and many duplicated listorder values spanning the many fk values.

If the WHERE clause is performed first, then I expect I shouldn't add an index to listorder as it will result in UPDATE performance degradation without significant improvement for SELECT.

user1032531
  • 24,767
  • 68
  • 217
  • 387
  • 1
    What Gordon Linoff sais. Plus, when evaluating indexes, never pose abstract questions, but always run `EXPLAIN` on the **actual** query with the **actual** data (or a reasonable set of what the data could be), because different data sets may trigger different indexes. – watery Aug 21 '15 at 01:37
  • @watery I currently don't have any data, but tried to explain what the data will look like as best as I could. – user1032531 Aug 21 '15 at 01:40
  • Yuk, I understand, but that's why I said "or a reasonable set of what the data could be", because the real data may change the index usage / selection (i.e. when dealing with multiple indexes on the same table). You may never be sure the index you create will be used as you expect. You could find that your DBMS isn't using the index at all. Index usage could even change with changes in datasets, that's why I suggested that. – watery Aug 21 '15 at 01:47
  • I have two very good answers. I am very happy, but it makes it tough to select one. Which I could just upvote both of them a few more times... – user1032531 Aug 21 '15 at 01:57
  • @user1032531, Ollie has the more correct answer. MySQL will read exactly the rows it needs from the index, which are *already* stored in the desired order, eliminating any need for sorting, if you have the correct multi column index. Add it. Yes, indexes increase update time, but they tend to *decrease `SELECT` time even more* unless your database has a high ratio of writes-to-reads, which is not usually the case. Be sure you read up on `EXPLAIN SELECT`. – Michael - sqlbot Aug 21 '15 at 03:08
  • @Michael-sqlbot Yes, I tend to agree, yet they were both good. – user1032531 Aug 21 '15 at 04:09
  • Let me just throw in this observation, in agreement with @watery. The actual sizes and definitions of the tables are always relevant. When your tables are a few hundred rows, everything is easy. When they grow to millions or hundreds of millions of rows, good indexing becomes an intricate task. – O. Jones Aug 21 '15 at 11:35

3 Answers3

3

The way SQL (all makes and models of servers) uses indexes to satisfy queries is a little more complex than you're assuming. Usually a query gets satisfied by filtering first (WHERE) then ordering.

For the exact query you showed us, if you have a compound index on (fk, listorder) the SQL engine will be able to use the index to satisfy both clauses of your query. The index will first be random-accessed by the WHERE clause, then it will be already in the order needed to satisfy your sorting clause.

Read this: http://use-the-index-luke.com/

Updating a compound index is not much more expensive than updating a single column index. Either way, using an index is better than having to scan the table for a WHERE operation.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks Ollie, I will read your suggested link. I see how the compound index will give me the best of both worlds for `SELECT` queries. Will it cause issues on `UPDATE` queries? Or does that performance impact already occur as I have (and need) and index on the tables PK and `fk`? – user1032531 Aug 21 '15 at 01:43
  • See my edit, please. Your fk index is, you say, very selective, so you probably won't see much difference in performance from adding the second column to that index. – O. Jones Aug 21 '15 at 01:45
2

The WHERE clause is evaluated first. I think this is always true in MySQL, but there might be an occasional exception (at least in other databases there is).

For this query:

SELECT id, data
FROM mytable
WHERE fk = 12345
ORDER BY listorder ASC;

The most practical index is mytable(fk, listorder).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I saw your answer change from `mostly` to `always` to `almost always` :) I am not concerned with other databases. Wouldn't index `mytable(fk, listorder)` causes slow updates (updates will happen fairly regularly for my application)? – user1032531 Aug 21 '15 at 01:38
  • @user1032531 . . . Any index will slow `update` a bit. This is generally not an issue, unless your load is measured in hundreds of updates per second. Some database engines will scan an index for the sort (on occasion), and then apply the `where`. I don't think MySQL does this. – Gordon Linoff Aug 21 '15 at 01:45
1

In SQL processing, the WHERE clause is considered an implicit join statement. In fact it is even equivalent to INNER JOIN among query optimizers. Older ANSI syntax only adopted INNER JOIN in the early 90s. Many older SQL select statements read as below:

SELECT * 
FROM table1, table2
WHERE table1.ID = table2.ID

which later the gold standard is as follows:

SELECT * 
FROM table1
INNER JOIN table2
ON table1.ID = table2.ID

However both statements are equivalent. But many argue INNER JOIN is more human readable. See this hearty SO post on INNER vs WHERE.

Unlike most programming languages, in SQL the order of syntax does not determine order of processing. Ironically though, the last line ORDER BY (unless TOP or LIMIT is declared) is usually the very last step and WHERE among the first just after the FROM clause:

FROM table source
JOIN condition
WHERE condition
GROUP BY expression
HAVING condition
SELECT fields
ORDER BY fields

Essentially, the engine structures the table and/or virtual tables determined by FROM, JOIN, and WHERE clauses. Once that structure is set up, then aggregation, field selection, and ordering is handled. So you could not order the table before you have the table!

Indices help in nearly all aspects of the processing. Setting an index on ORDER BY would not lead to performance degradation. But aligning WHERE and ORDER BY can facilitate sorting optimization. See this MySQL reference. In fact, MySQL is known to leave out indices if not needed.

Community
  • 1
  • 1
Parfait
  • 104,375
  • 17
  • 94
  • 125