0

What's the execution order of each part of a sql query,like SELECT、DISTINCT、FROM、WHERE、GROUP BY、ORDER BY···

I searched a large number of sites said that ORDER BY executes after SELECT,if this is true,a simple query like 'select column1 from table1 order by column2' should not execute because after executing SELECT,there is only column1 in the dataset,it can't use column2 to sort the dataset. But actually it works!

aaron
  • 1
  • 1
  • You are getting confused with select, SELECT means selecting the required data from the pages, it is not about displaying the required column in some console. – PSK Feb 20 '19 at 06:07
  • so this means no matter which columns list in SELECT clause,mysql will always search the whole data page? – aaron Feb 20 '19 at 06:28
  • The fact that the OP states in their question "a large number of sites said that ORDER BY executes after SELECT" indicates to me that the marked duplicate isn't going to contain the answers they seek, because they're more stumbling over what it *means* for the `SELECT` clause to execute *at all*. They've already found out `ORDER BY` is after `SELECT`. – Damien_The_Unbeliever Feb 20 '19 at 08:41

2 Answers2

0

Consider a query -

select distinct <columns> from 
table1 t1 inner join t2 
on t1.col=t2.col
where <conditions>
group by <col>
having <conditions>

Order of execution would be -

> From
> ON
> JOIN
> Where
> group by
> Having
> Select
> Distinct
> Order By
Vijiy
  • 1,187
  • 6
  • 21
  • if 'SELECT' executes before 'ORDER BY' ,the columns appear in ORDER BY clause should only be those appear in SELECT list? – aaron Feb 20 '19 at 06:03
  • why you think second query doesnt work? `select` is done before the `order by` http://sqlfiddle.com/#!9/93d83b/1 – Juan Carlos Oropeza Feb 20 '19 at 06:45
  • I am not able to re collect the scenario, where i have faced issue while with select and order by, have deleted these changes from my post – Vijiy Feb 20 '19 at 08:36
  • 1
    The problem is try to use the alias on the same select . like this: `SELECT 2*field as new_alias, 3*new_alias` because SELECT isnt process `new_alias` doesnt exists yet – Juan Carlos Oropeza Feb 20 '19 at 13:49
  • Thank You, that was helpful – Vijiy Feb 20 '19 at 14:04
0

Let's decompose two queries against two tables, both containing two columns. First, we'll do a simple one:

SELECT t1.a,t2.d + 6 as e
FROM
    table1 t1
       inner join
    table2 t2
       on
           t1.a = t2.c
WHERE
    t1.b = 2
ORDER BY
    t2.c

And lets consider what is "in scope" as we complete each clause:

  • FROM table1 t1 - at this point, we have a result set containing two columns - {t1.a, t1.b}.

  • INNER JOIN table2 t2 ON ... - we now have a result set containing four columns - T1.a, t1.b, t2.c, t2.d}. We may personally also now that a and c are equal but that's irrelevant for the analysis.

  • WHERE - although WHERE can filter rows from a query, it doesn't change the set of columns making up the result set - it's still {t1.a, t1.b, t2.c, t2.d}.

  • SELECT - we don't have a GROUP BY clause, and so the job of the SELECT clause here is a) to mark some columns for output and b) possibly to add some additional columns whose values are computed. That's what we have here. We end up with a set of {O(t1.a), t1.b, t2.c, t2.d, O(e = t2.d +6)}1.

  • ORDER BY - now we order by t2.c, which is still in scope despite the fact that it won't be output

  • finally, the outputs of this query are delivered (technically via a cursor) and just contains {a, e}. The columns no longer have their "originating table" associated with them, and the non-output columns disappear into the ether.


SELECT
    t1.a,SUM(t2.d) as e
FROM
    table1 t1
       inner join
    table2 t2
       on
          t1.a = t2.c
GROUP BY t1.a
HAVING e > 5
ORDER BY t1.a
  • The FROM/JOIN clauses are identical to previously and so the same analysis prevails. Similarly we have no WHERE clause but it's irrelevant to the set of columns. We have {t1.a, t1.b, t2.c, t2.d}.

  • SELECT/GROUP BY/DISTINCT. DISTINCT and GROUP BY are really the same thing - both identify a set of columns either explicitly (GROUP BY) or by their existing in the SELECT clause. You cannot untie SELECT from GROUP BY because we also have to compute aggregates and the aggregate definitions are in the SELECT clause. For each distinct set of values evident in the grouping columns, we produce a single output row containing that set of values together with any computed aggregates. We produce here {O(t1.a), O(e)}2 and that is the result set that the remaining parts of the query can observe. The original result set is not in scope.

  • HAVING - we can work with just those columns produced by the SELECT clause3. But again, we filter rows, not columns.

  • and ORDER BY can also only work with the columns produced by the SELECT.

  • By the time SELECT was done, we only had output columns anyway but the output processing is the same anyway.


Hopefully, from the above you can see that SELECT can work in two quite different ways; but at least now you're aware of the difference and what the knock-on effects of that are.


1I'm making up terminology on the fly here, but I'm using the O() wrapper to mean "this column will be in the final result set".

2This is the behaviour you appear to have been expecting SELECT to always exhibit, only providing the "outputable" rows to later clauses.

3 contains an extension to the SQL standard that allows non-grouped and non-aggregated columns to appear as HAVING clause predicates. They're effectively re-written to be used in the WHERE clause instead.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448