-1

I am really confused about the execution order in SQL. Basically, given any query (assume it's a complex one that has multiple JOINS, WHERE clauses, etc), is the query executed sequentially or not?

From the top answer at Order Of Execution of the SQL query, it seems like "SQL has no order of execution. ... The optimizer is free to choose any order it feels appropriate to produce the best execution time."

From the top answer at What's the execute order of the different parts of a SQL select statement?, in contrast, we see a clear execution order in the form "

  1. FROM
  2. ON
  3. OUTER
  4. WHERE ... "

I feel like I am missing something, but it seems as though the two posts are contradicting each other, and different articles online seem to support either one or the other.

But more fundamentally, what I wanted to know initially is this: Suppose we have a complex SQL query with multiple joins, INNER JOINs and LEFT JOINS, in a specific order. Is there going to be an order to the query, such that a later JOIN will apply to the result of an earlier join rather than to the initial table specified in the top FROM clause?

dmg
  • 4,231
  • 1
  • 18
  • 24
Etfrerrr
  • 89
  • 2
  • 8
  • No - that is NOT what the top answer states. It specifically says "logical order". And that has an impact on what you can reference in different clauses. E.g., you cannot refer to an expression in your SELECT list within the WHERE clause. The JOIN order is determined by parentheses as well as the ordering of joined tables. So yes - mixing of inner and outer joins within a query does have an impact if you do not enforce join order that your logic depends on. Many do not realize you can parenthesize your joins to enforce order. – SMor Jul 13 '21 at 20:33
  • @SMor - can you elaborate on what "parenthesize your joins" looks like syntactically? I see a brief reference to what you're describing at https://learn.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15#joined-table but can't work out what they're getting at from the grammar definition. – Ben Thul Jul 13 '21 at 20:47
  • @SMor is referring to for example `from t1 left join t2 inner join t3 on ... on ...` where the second `on` clause refers to a `left join` between `t1` and the resulting join of `t2` and `t3`. **Parenthesis makes absolutely no difference there** only the order of the `on` clauses, it's a complete myth that `()` around the joins do anything at all – Charlieface Jul 13 '21 at 21:01
  • That answer only refers to the logical order (what you are allowed to do, what things mean in particular places), not the physical execution (what the compiler builds to execute the query). Your query has certain logical constructs in it, which means that the results must conform to some specification, but the server is free to obtain the results in any way possible, including getting an alien from Mars to provide it. For example, it's not bound to execute joins in the same `select` in any particular order, nor evaluate `where` conditions in order – Charlieface Jul 13 '21 at 21:03
  • Does this answer your question? [Order Of Execution of the SQL query](https://stackoverflow.com/questions/4596467/order-of-execution-of-the-sql-query) – philipxy Dec 20 '22 at 00:44

4 Answers4

2

It is tricky. The short answer is: the DBMS will decide what order is best such that it produces the result that you have declared (remember, SQL is declarative, it does not prescribe how the query is to be computed).

But we can think of a "conceptual" order of execution that the DBMS will use to create the result. This conceptual order might be totally ignored by the DBMS, but if we (humans) follow it, we will get the same results as the DBMS. I see this as one of the benefits of a DBMS. Even if we suck and write an inefficient query, the DBMS might say, "no, no, this query you gave me sucks in terms of performance, I know how to do better" and most of the time, the DBMS is right. Sometimes it is not, and rewriting a query helps the DBMS find the "best" approach. This is very dependent of the DBMS of course...

This conceptual order help us we (humans) to understand how the DBMS executes a query. These are listed below.

First the order for non-aggregation:

  1. Do the FROM section. Includes any joins, cross products, subqueries.
  2. Do the WHERE clause (remove tuples, this is called selection)
  3. Do the SELECT portion (report results, this is called projection).

If you use an aggregation function, without a group by then:

  1. Do the FROM section. Includes any joins, subqueries.
  2. Do the WHERE clause (remove tuples, this is called selection)
  3. Do the aggregation function in the SELECT portion (converting all tuples of the result into one tuple). There is an implicit group by in this query.

If you use a group by:

  1. Do the FROM section. Includes any joins, cross products, subqueries.
  2. Do the WHERE clause (remove tuples, this is called selection)
  3. Cluster subsets of the tuples according to the GROUP BY.
  4. For each cluster of these tuples:
  • if there is a HAVING, do this predicate (similar to selection of the WHERE).Note that you can have access to aggregation functions.
  1. For each cluster of these tuples output exactly one tuple such that:
    • Do the SELECT part of the query (similar to select in above aggregation, i.e. you can use aggregation functions).

Window functions happen during the SELECT stage (they take into consideration the set of tuples that would be output by the select at that stage).

There is one more kink:

if you have

select distinct ...

then after everything else is done, then remove DUPLICATED tuples from the results (i.e. return a set of tuples, not a list).

Finally, do the ORDER BY. The ORDER BY happens in all cases at the end, once the SELECT part has been done.

With respect to JOINS. As I mentioned above, they happen at the "FROM" part of the conceptual execution. The WHERE, GROUP BY, SELECT apply on the results of these operations. So you can think of these being the first phase of the execution of the query. If it contains a subquery, the process is recursive.

By the way, you can refer in an inner query to a relation in the outside context of the inner query, but not the other way around.

All of this is conceptual. In reality the DBMS might rewrite your query for the purpose of efficiency.

For example, assume R(a,b) and S(a,c). WHere S(a) is a foreign key that references R(A).

The query:

select b from R JOIN S using (a) where a > 10

can be rewritten by the DBMS to something similar to this:

select b FROM R JOIN (select a from s where a > 10) as T using (a);

or:

select b FROM (select * from R where a > 10) as T JOIN S using (a);

In fact, the DBMS does this all the time. It takes your query, and creates alternates queries. then estimates the execution time of each query and decides which one is the most likely to be the fastest. And then it executes it.

This is a fundamental process of query evaluation. Note that the 3 queries are identical in terms of results. But depending on the sizes of the relations, they might have very different execution times. For example, if R and S are huge, but very few tuples have a>0, then the join wastes time. Each query with a subselect might perform fast if that subselect matches very few tuples, but badly if they match a lot of tuples. This is the type of "magic" that happens inside the query evaluation engine of the DBMS.

dmg
  • 4,231
  • 1
  • 18
  • 24
1

You are confusing Order of execution with Logical Query Processing.

I did a quick google search and found a bunch of articles referring to Logical Query Processing as "order of execution". Let's clear this up.

Logical Query Processing

Logical Query Processing details the under-the-hood processing phases of a SQL Query... First the WHERE clause is evaluated for the optimizer to know where to get data from, then table operators, etc.

Understanding this will help you better design and tune queries. Logical query processing order will help you understand why you can reference a column by it's alias in an ORDER BY clause but not anywhere else.

enter image description here

Order of Execution

Consider this WHERE clause:

WHERE t1.Col1 = 'X'
 AND  t2.Col2 = 1
 AND  t3.Col3 > t2.Col4

The optimizer is not required to evaluate these predicates in any order; it can evaluate t2.Col2 = 1 first, then t1.Col1 = 'X'.... The optimizer, in some cases can evaluate joins in a different order than than you have presented in your query. When predicate logic dictates that the result will be the same, it is free to make (what it considers) the best choices for optimal performance.

Sadly there is not a lot about this topic out there. I do discuss this a little more here.

Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
1

First there's the SQL query and the rules of SQL that apply to it. That's what in the other answers is referred to as "Logical query processing". With SQL you specify a result. The SQL standard does not allow you to specify how this result is reached.

Then there's the query optimizer. Based on statistics, heuristics, amount of available CPU, memory and other factors, it will determine the execution plan. It will evaluate how long the execution is expected to take. It will evaluate different execution plans to find the one that executes fastest. In that process, it can evaluate execution plans that use different indexes, and/or rearranges the join order, and/or leave out (outer) joins, etc. The optimizer has many tricks. The more expensive the best execution plan is expected to be, the more (advanced) execution plans will be evaluated. The end result is one (serial) execution plan and potentially a parallel execution plan.

All the evaluated execution plans will guarantee the correct result; the result that matches execution according to the "Logical query processing".

Finally, there's the SQL Server engine. After picking either the serial or parallel execution plan, it will execute it.

Gert-Jan
  • 327
  • 1
  • 9
1

The other answers, whilst containing useful and interesting information, risk causing confusion in my view.

They all seem to introduce the notion of a "logical" order of execution, which differs from the actual order of execution, as if this is something special about SQL.

If someone asked about the order of execution of any ordinary language besides SQL, the answer would be "strictly sequential" or (for expressions) "in accordance with the rules of that language". I feel as though we wouldn't be having a long-winded exploration about how the compiler has total freedom to rearrange and rework any algorithm that the programmer writes, and distinguishing this from the merely "logical" representation in the source code.

Ultimately, SQL has a defined order of evaluation. It is the "logical" order referred to in other answers. What is most confusing to novices is that this order does not correspond with the syntactic order of the clauses in an SQL statement.

That is, a simple SELECT...FROM...WHERE...ORDER BY query would actually be evaluated by taking the table referred to in the from-clause, filtering rows according to the where-clause, then manipulating the columns (including filtering, renaming, or generating columns) according to the select-clause, and finally ordering the rows according to the order-by-clause. So clauses here are evaluated second, third, first, fourth, which is a disorderly pattern to any sensible programmer - the designers of SQL preferred to make it correspond more in their view to the structure of something spoken in ordinary English ("tell me the surnames from the register!").

Nevertheless, when the programmer writes SQL, they are specifying the canonical method by which the results are produced, same as if they write source code in any other language.

The query simplification and optimisation that database engines perform (like that which ordinary compilers perform) would be a completely separate topic of discussion, if it hadn't already been conflated. The essence of the situation on this front, is that the database engine can do whatever it damn well likes with the SQL you submit, provided that the data it returns to you is the same as if it had followed the evaluation order defined in SQL.

For example, it could sort the results first, and then filter them, despite this order of operations being clearly different to the order in which the relevant clauses are evaluated in SQL. It can do this because if you (say) have a deck of cards in random order, and go through the deck and throw away all the aces, and then sort the deck into standard order, the outcome (in terms of the final content and order of the deck) is no different than if you sort the deck into standard order first, and then go through and throw away all the aces. But the full details and rationale of this behaviour would be for a separate question entirely.

Steve
  • 950
  • 7
  • 11