71

I have my business-logic in ~7000 lines of T-SQL stored procedures, and most of them has next JOIN syntax:

SELECT A.A, B.B, C.C
FROM aaa AS A, bbb AS B, ccc AS C
WHERE
    A.B = B.ID
AND B.C = C.ID
AND C.ID = @param

Will I get performance growth if I will replace such query with this:

SELECT A.A, B.B, C.C
FROM aaa AS A
JOIN bbb AS B
   ON A.B = B.ID
JOIN ccc AS C
   ON B.C = C.ID
   AND C.ID = @param

Or they are the same?

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
abatishchev
  • 98,240
  • 88
  • 296
  • 433
  • I don't know why you edited the title: it now contains misstatements. Both queries are joins and their respective syntax are both enshrined in the SQL-92 Standard. Note the Standard is international (ISO) in addition to being American (ANSI). – onedaywhen Sep 09 '11 at 12:50
  • 2
    From the Wikipedia article on [SQL](http://en.wikipedia.org/wiki/SQL): "SQL was adopted as a standard by the American National Standards Institute (ANSI) in 1986 as SQL-86 and the International Organization for Standardization (ISO) in 1987." In the literature, the qualifier ISO/IEC seems to be far more common than ANSI; I think ANSI is prevalent in forums such as Stackoverflow because products such as SQL Server and MySQL have keywords that use the word "ANSI" e.g. ANSI_NULLS (I've no idea why they don't use "ISO"); though these are my own observations, I doubt they are original thoughts ;) – onedaywhen Sep 12 '11 at 07:40
  • Further note that each Standard adopts (and adds to) the features of the previous Standard SQL, so that Standard SQL features are never deprecated, what Hugh Darwen refers to as the [*Shackle of Compatibility*](http://www.dcs.warwick.ac.uk/~hugh/TTM/HAVING-A-Blunderful-Time.html). – onedaywhen Sep 12 '11 at 07:43
  • You can can prove the above yourself by reading each Standard's spec ;) As regards proof both queries are Standard SQL: use the [Mimer SQL-92 Validator](http://developer.mimer.com/validator/parser92/index.tml): copy and paste each query one at a time into the box, edit to remove the proprietary SQL Server syntax `@param` (e.g. change it to the literal value `1`) and click 'Test SQL'. Both will generate the result , "Transitional SQL-92". – onedaywhen Sep 12 '11 at 07:52
  • You can repeat the above using the [SQL-99 validator](http://developer.mimer.com/validator/parser99/index.tml) and [SQL:2003 validator](http://developer.mimer.com/validator/parser200x/index.tml )respectively. I assume you aren't asking me to prove they are both joins and semantically equivalent, a fact which any half decent optimizer should recognize, because this is noted in the accepted answer. – onedaywhen Sep 12 '11 at 07:57

7 Answers7

91

The two queries are the same, except the second is ANSI-92 SQL syntax and the first is the older SQL syntax which didn't incorporate the join clause. They should produce exactly the same internal query plan, although you may like to check.

You should use the ANSI-92 syntax for several of reasons

  • The use of the JOIN clause separates the relationship logic from the filter logic (the WHERE) and is thus cleaner and easier to understand.
  • It doesn't matter with this particular query, but there are a few circumstances where the older outer join syntax (using + ) is ambiguous and the query results are hence implementation dependent - or the query cannot be resolved at all. These do not occur with ANSI-92
  • It's good practice as most developers and dba's will use ANSI-92 nowadays and you should follow the standard. Certainly all modern query tools will generate ANSI-92.
  • As pointed out by @gbn, it does tend to avoid accidental cross joins.

Myself I resisted ANSI-92 for some time as there is a slight conceptual advantage to the old syntax as it's a easier to envisage the SQL as a mass Cartesian join of all tables used followed by a filtering operation - a mental technique that can be useful for grasping what a SQL query is doing. However I decided a few years ago that I needed to move with the times and after a relatively short adjustment period I now strongly prefer it - predominantly because of the first reason given above. The only place that one should depart from the ANSI-92 syntax, or rather not use the option, is with natural joins which are implicitly dangerous.

Cruachan
  • 15,733
  • 5
  • 59
  • 112
  • 2
    I had a similar incline to climb 10 years ago by leaving behind the older comma/equals form of join, but have not looked back. – Larry Smith May 14 '15 at 21:34
  • 1
    When examining a large query, it's helpful to see the join logic in close proximity to the relevant tables. The older syntax can require a lot of up-and-down scrolling to examine the tables and predicates together. – durette Jun 26 '17 at 16:07
  • As a further comment on avoiding cross joins, explicit joins make it easier to spot a missing predicate when joining by a composite key. Such a missing predicate might cause insidious problems in corner cases when the final column of a large composite key has a very low cardinality, so they're not always obvious during testing. The cleaner syntax makes these problems more obvious at development time. – durette Jun 26 '17 at 16:13
5

The second construct is known as the "infixed join syntax" in the SQL community. The first construct AFAIK doesn't have widely accepted name so let's call it the 'old style' inner join syntax.

The usual arguments go like this:

Pros of the 'Traditional' syntax: the predicates are physically grouped together in the WHERE clause in whatever order which makes the query generally, and n-ary relationships particularly, easier to read and understand (the ON clauses of the infixed syntax can spread out the predicates so you have to look for the appearance of one table or column over a visual distance).

Cons of the 'Traditional' syntax: There is no parse error when omitting one of the 'join' predicates and the result is a Cartesian product (known as a CROSS JOIN in the infixed syntax) and such an error can be tricky to detect and debug. Also, 'join' predicates and 'filtering' predicates are physically grouped together in the WHERE clause, which can cause them to be confused for one another.

onedaywhen
  • 55,269
  • 12
  • 100
  • 138
4

The two queries are equal - the first is using non-ANSI JOIN syntax, the 2nd is ANSI JOIN syntax. I recommend sticking with the ANSI JOIN syntax.

And yes, LEFT OUTER JOINs (which, btw are also ANSI JOIN syntax) are what you want to use when there's a possibility that the table you're joining to might not contain any matching records.

Reference: Conditional Joins in SQL Server

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
4

OK, they execute the same. That's agreed. Unlike many I use the older convention. That SQL-92 is "easier to understand" is debatable. Having written programming languages for pushing 40 years (gulp) I know that 'easy to read' begins first, before any other convention, with 'visual acuity' (misapplied term here but it's the best phrase I can use). When reading SQL the FIRST thing you mind cares about is what tables are involved and then which table (most) defines the grain. Then you care about relevant constraints on the data, then the attributes selected. While SQL-92 mostly separates these ideas out, there are so many noise words, the mind's eye has to interpret and deal with these and it makes reading the SQL slower.

SELECT Mgt.attrib_a   AS attrib_a
      ,Sta.attrib_b   AS attrib_b
      ,Stb.attrib_c   AS attrib_c
FROM   Main_Grain_Table  Mgt
      ,Surrounding_TabA  Sta
      ,Surrounding_tabB  Stb
WHERE  Mgt.sta_join_col  = Sta.sta_join_col
AND    Mgt.stb_join_col  = Stb.stb_join_col
AND    Mgt.bus_logic_col = 'TIGHT'

Visual Acuity! Put the commas for new attributes in front It makes commenting code easier too Use a specific case for functions and keywords Use a specific case for tables Use a specific case for attributes Vertically Line up operators and operations Make the first table(s) in the FROM represent the grain of the data Make the first tables of the WHERE be join constraints and let the specific, tight constraints float to the bottom. Select 3 character alias for ALL tables in your database and use the alias EVERYWHERE you reference the table. You should use that alias as a prefix for (many) indexes on that table as well. 6 of 1 1/2 dozen of another, right? Maybe. But even if you're using ANSI-92 convention (as I have and in cases will continue to do) use visual acuity principles, verticle alignment to let your mind's eye avert to the places you want to see and and easily avoid things (particularly noise words) you don't need to.

3

Execute both and check their query plans. They should be equal.

sisve
  • 19,501
  • 3
  • 53
  • 95
  • 3
    +1 to using the analysis tools. I would recommend looking at: for more "sugar-filled" ways of specifying joins. Let the standards be your guide, but remember that your particular DBs implementation is the path (and hopefully it's not MySQL :) –  Oct 21 '09 at 06:43
1

In my mind the FROM clause is where I decide what columns I need in the rows for my SELECT clause to work on. It is where a business rule is expressed that will bring onto the same row, values needed in calculations. The business rule can be customers who have invoices, resulting in rows of invoices including the customer responsible. It could also be venues in the same postcode as clients, resulting in a list of venues and clients that are close together.

It is where I work out the centricity of the rows in my result set. After all, we are simply shown the metaphor of a list in RDBMSs, each list having a topic (the entity) and each row being an instance of the entity. If the row centricity is understood, the entity of the result set is understood.

The WHERE clause, which conceptually executes after the rows are defined in the from clause, culls rows not required (or includes rows that are required) for the SELECT clause to work on.

Because join logic can be expressed in both the FROM clause and the WHERE clause, and because the clauses exist to divide and conquer complex logic, I choose to put join logic that involves values in columns in the FROM clause because that is essentially expressing a business rule that is supported by matching values in columns.

i.e. I won't write a WHERE clause like this:

 WHERE Column1 = Column2

I will put that in the FROM clause like this:

 ON Column1 = Column2

Likewise, if a column is to be compared to external values (values that may or may not be in a column) such as comparing a postcode to a specific postcode, I will put that in the WHERE clause because I am essentially saying I only want rows like this.

i.e. I won't write a FROM clause like this:

 ON PostCode = '1234'

I will put that in the WHERE clause like this:

 WHERE PostCode = '1234'
John
  • 11
  • 2
  • 1
    `ON AND PostCode = '1234'` can be useful on `left joins` for conditional joining, but not excluding rows with PostCode <> '1234' – bummi Oct 18 '14 at 22:35
0

ANSI syntax does enforce neither predicate placement in the proper clause (be that ON or WHERE), nor the affinity of the ON clause to adjacent table reference. A developer is free to write a mess like this

SELECT
   C.FullName,
   C.CustomerCode,
   O.OrderDate,
   O.OrderTotal,
   OD.ExtendedShippingNotes
FROM
   Customer C
   CROSS JOIN Order O
   INNER JOIN OrderDetail OD
      ON C.CustomerID = O.CustomerID
      AND C.CustomerStatus = 'Preferred'
      AND O.OrderTotal > 1000.0
WHERE
   O.OrderID = OD.OrderID;

Speaking of query tools who "will generate ANSI-92", I'm commenting here because it generated

SELECT 1
   FROM DEPARTMENTS C
        JOIN EMPLOYEES A
             JOIN JOBS B
     ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
     ON A.JOB_ID = B.JOB_ID

The only syntax that escapes conventional "restrict-project-cartesian product" is outer join. This operation is more complicated because it is not associative (both with itself and with normal join). One have to judiciously parenthesize query with outer join, at least. However, it is an exotic operation; if you are using it too often I suggest taking relational database class.

Tegiri Nenashi
  • 3,066
  • 2
  • 19
  • 20