1

There are a few discussions on StackOverflow of commutativity of joins, such as 1, 2, 3, and 4. It gets pretty complicated, and I don't think any of them answers my question here.

I've often noticed that when I put the SQL for a query into Access, Access messes with the order of my joins, and even changes LEFT joins to RIGHT ones. I generally write my outer joins as LEFTs, laying out the sequence of joined tables logically, so I don't like it when Access messes that up. But now I've noticed a difference in how a query is represented in Design View, and I want to know if the difference is significant.

Here are three queries:

Query1

SELECT Table1.ID_Table1
FROM Table2 RIGHT JOIN (Table1 LEFT JOIN Table12 
    ON Table1.ID_Table1 = Table12.ID_Table1) ON Table2.ID_Table2 = Table12.ID_Table2;

Query2

SELECT Table1.ID_Table1
FROM Table1 LEFT JOIN (Table2 RIGHT JOIN Table12 
    ON Table2.ID_Table2 = Table12.ID_Table2) ON Table1.ID_Table1 = Table12.ID_Table1;

Query3

SELECT Table1.ID_Table1
FROM Table1 LEFT JOIN (Table12 LEFT JOIN Table2 
    ON Table12.ID_Table2 = Table2.ID_Table2) ON Table1.ID_Table1 = Table12.ID_Table1;

I prefer Query3 because it has the logical order I mentioned.

When I entered these queries as SQL, Access changed the code of Query2 to be the same as Query1, and it did not modify the code of Query1 or Query3. When the three queries are run (with very simple data, so not conclusive), they all produce the same result. In Design View, Query1 and Query2 look identical, which is good since Access converted Query2 into Query1. Query3 looks almost the same, except that the relationship between Table2 and Table12, which is represented in the view of Query1 and Query2 is not represented in the view of Query3.

Three queries testing commutativity of SQL joins

So my question is, is Query3 equivalent to Query1 and Query2, so that it's safe to ignore the loss of the representation of the table relationships in Design View, or is there an operational difference in the queries? If there is a difference, is there a guiding principle for ordering of joins?

NewSites
  • 1,402
  • 2
  • 11
  • 26
  • If you think you know how left & right join return, why don't you think they are equivalent? – philipxy Jan 23 '19 at 04:19
  • Two reasons. Because commutativity is a complex phenomenon with sometimes surprising results, so it's not safe to just assume the order of things that look equivalent can just be switched. And because something has caused one of the table relationships to disappear in the GUI, which suggests that something is different. – NewSites Jan 23 '19 at 05:17

1 Answers1

1

This is not an answer but a comment that doesn't fit in the comments section.

...Access messes with the order of my joins, and even changes LEFT joins to RIGHT ones...

Well, the thing is that SQL is a declarative language. Unlike imperative languages such as C, C++, Java, PHP, etc. you don't specify how to retrieve the data you need, but instead you tell the SQL engine what you need.

When you run a SQL statement, the database engine will process the query in several "phases", from parsing, caching, applying parameters, rephrasing, planning, optimizing, executing, pipelining, delivering, etc. Each database engine will add more phases to the ones that I mention.

The ones of interest for this question are rephrasing, planning & optimizing.

  • Very often the engine will rephrase the SQL to better manage it. There can be a million reasons why the engine developers decided the engine will manage a plan better if it's written "upside down". You can't really control this. Sometime there are really valid reasons you can understand; other times it has to do with the internal organization of the database engine.

  • Then, the SQL planner considers the tables you mention, the filtering conditions you specified, the existing indexes, the statistics of the tables (up to date or stale), and produces a set of plans. All these plans will produce a valid result.

  • This set of plans is then fed to the optimizer, that will assess the cost (elapsed time, disk I/O, memory consumption, network bandwidth, etc.) and will select the "best one" to the best of its knowledge. Please conside that to the best of its knowledge can be pretty lame, depending on how weak or great the optimizer is, and more often than not, the selected "optimal" plan may not be the greatest.

Now, to be realistic, MS-Access is definitively not the greatest of the database engines out there. You might expect the SQL planner and optimizers are not awesome. They do their job, but they are designed for a modest amount of data, not millions of rows in mind.

Bottom line, do not expect the execution plan will match exactly what you typed. SQL is a declarative language and the optimizer is free to rewrite and massage the query as much as it wants, to store and retrieve data in an efficient and safe way.

The Impaler
  • 45,731
  • 9
  • 39
  • 76