29

the two bits of SQL below get the same result

SELECT c.name, o.product  
FROM customer c, order o  
WHERE c.id = o.cust_id  
AND o.value = 150  

SELECT c.name, o.product  
FROM customer c  
INNER JOIN order o on c.id = o.cust_id  
WHERE o.value = 150

I've seen both styles used as standard at different companies. From what I've seen, the 2nd one is what most people recommend online. Is there any real reason for this other than style? Does using an Inner Join sometimes have better performance?

I've noticed Ingres and Oracle developers tend to use the first style, whereas Microsoft SQL Server users have tended to use the second, but that might just be a coincidence.

Thanks for any insight, I've wondered about this for a while.

Edit: I've changed the title from 'SQL Inner Join versus Cartesian Product' as I was using the incorrect terminlogy. Thanks for all the responses so far.

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
Dean Madden
  • 341
  • 3
  • 6
  • 14

7 Answers7

29

Both queries are an inner joins and equivalent. The first is the older method of doing things, whereas the use of the JOIN syntax only became common after the introduction of the SQL-92 standard (I believe it's in the older definitions, just wasn't particularly widely used before then).

The use of the JOIN syntax is strongly preferred as it separates the join logic from the filtering logic in the WHERE clause. Whilst the JOIN syntax is really syntactic sugar for inner joins it's strength lies with outer joins where the old * syntax can produce situations where it is impossible to unambiguously describe the join and the interpretation is implementation-dependent. The [LEFT | RIGHT] JOIN syntax avoids these pitfalls, and hence for consistency the use of the JOIN clause is preferable in all circumstances.

Note that neither of these two examples are Cartesian products. For that you'd use either

SELECT c.name, o.product  
FROM customer c, order o  
WHERE o.value = 150  

or

SELECT c.name, o.product  
FROM customer c  CROSS JOIN order o 
WHERE o.value = 150
Vishwanath Dalvi
  • 35,388
  • 41
  • 123
  • 155
Cruachan
  • 15,733
  • 5
  • 59
  • 112
  • 1
    The JOIN notation was introduced in SQL-92; it didn't become prevalent until much later. – Jonathan Leffler Dec 27 '08 at 21:52
  • Thanks for the correction, I did check to see if I could find the date. Personally I really only converted to using JOINS around 2001 or 2 - I found the usage a bit strange at first but much prefer it now. – Cruachan Dec 28 '08 at 10:28
  • Could you please give an example of such a situation where the old syntax is necessarily ambiguous? Thank you. – Evgeniy Dec 04 '19 at 18:48
6

To answer part of your question, I think early bugs in the JOIN ... ON syntax in Oracle discouraged Oracle users away from that syntax. I don't think there are any particular problems now.

They are equivalent and should be parsed into the same internal representation for optimization.

WW.
  • 23,793
  • 13
  • 94
  • 121
  • Some of the bugs in the ANSI JOIN syntax aren't what I would call "early". At a previous job where we did a lot of huge queries, we ran into a hard limit on the number of total columns you could have in a query if you used the ANSI JOIN syntax. You'll find that at least as late as the 10g release there were still problems. – Nate C-K Aug 28 '12 at 12:56
  • Yeah, I still don't use ANSI joins. – WW. Aug 28 '12 at 13:41
4

Actually these examples are equivalent and neither is a cartesian product. A cartesian product is returned when you join two tables without specifying a join condition, such as in

select *
from t1,t2

There is a good discussion of this on Wikipedia.

Ed Guiness
  • 34,602
  • 16
  • 110
  • 145
4

Oracle was late in supporting the JOIN ... ON (ANSI) syntax (not until Oracle 9), that's why Oracle developers often don't use it.

Personally, I prefer using ANSI syntax when it is logically clear that one table is driving the query and the others are lookup tables. When tables are "equal", I tend to use the cartesian syntax.

The performance should not differ at all.

erikkallen
  • 33,800
  • 13
  • 85
  • 120
3

Both queries are performing an inner join, just different syntax.

Brian Knoblauch
  • 20,639
  • 15
  • 57
  • 92
Jim Anderson
  • 3,602
  • 2
  • 24
  • 21
3

The JOIN... ON... syntax is a more recent addition to ANSI and ISO specs for SQL. The JOIN... ON... syntax is generally preferred because it 1) moves the join criteria out of the WHERE clause making the WHERE clause just for filtering and 2) makes it more obvious if you are creating a dreaded Cartesian product since each JOIN must be accompanied by at least one ON clause. If all the join criteria are just ANDed in the WHERE clause, it's not as obvious when one or more is missing.

2

TL;DR

An INNER JOIN statement can be rewritten as a CROSS JOIN with a WHERE clause matching the same condition you used in the ON clause of the INNER JOIN query.

Table relationship

Considering we have the following post and post_comment tables:

The post and post_comment tables

The post has the following records:

| id | title     |
|----|-----------|
| 1  | Java      |
| 2  | Hibernate |
| 3  | JPA       |

and the post_comment has the following three rows:

| id | review    | post_id |
|----|-----------|---------|
| 1  | Good      | 1       |
| 2  | Excellent | 1       |
| 3  | Awesome   | 2       |

SQL INNER JOIN

The SQL JOIN clause allows you to associate rows that belong to different tables. For instance, a CROSS JOIN will create a Cartesian Product containing all possible combinations of rows between the two joining tables.

While the CROSS JOIN is useful in certain scenarios, most of the time, you want to join tables based on a specific condition. And, that's where INNER JOIN comes into play.

The SQL INNER JOIN allows us to filter the Cartesian Product of joining two tables based on a condition that is specified via the ON clause.

SQL INNER JOIN - ON "always true" condition

If you provide an "always true" condition, the INNER JOIN will not filter the joined records, and the result set will contain the Cartesian Product of the two joining tables.

For instance, if we execute the following SQL INNER JOIN query:

SELECT
   p.id AS "p.id",
   pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 1

We will get all combinations of post and post_comment records:

| p.id    | pc.id      |
|---------|------------|
| 1       | 1          |
| 1       | 2          |
| 1       | 3          |
| 2       | 1          |
| 2       | 2          |
| 2       | 3          |
| 3       | 1          |
| 3       | 2          |
| 3       | 3          |

So, if the ON clause condition is "always true", the INNER JOIN is simply equivalent to a CROSS JOIN query:

SELECT
   p.id AS "p.id",
   pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 1
ORDER BY p.id, pc.id

SQL INNER JOIN - ON "always false" condition

On the other hand, if the ON clause condition is "always false", then all the joined records are going to be filtered out and the result set will be empty.

So, if we execute the following SQL INNER JOIN query:

SELECT
   p.id AS "p.id",
   pc.id AS "pc.id"
FROM post p
INNER JOIN post_comment pc ON 1 = 0
ORDER BY p.id, pc.id

We won't get any result back:

| p.id    | pc.id      |
|---------|------------|

That's because the query above is equivalent to the following CROSS JOIN query:

SELECT
   p.id AS "p.id",
   pc.id AS "pc.id"
FROM post p
CROSS JOIN post_comment
WHERE 1 = 0
ORDER BY p.id, pc.id

SQL INNER JOIN - ON clause using the Foreign Key and Primary Key columns

The most common ON clause condition is the one that matches the Foreign Key column in the child table with the Primary Key column in the parent table, as illustrated by the following query:

SELECT
   p.id AS "p.id",
   pc.post_id AS "pc.post_id",
   pc.id AS "pc.id",
   p.title AS "p.title",
   pc.review  AS "pc.review"
FROM post p
INNER JOIN post_comment pc ON pc.post_id = p.id
ORDER BY p.id, pc.id

When executing the above SQL INNER JOIN query, we get the following result set:

| p.id    | pc.post_id | pc.id      | p.title    | pc.review |
|---------|------------|------------|------------|-----------|
| 1       | 1          | 1          | Java       | Good      |
| 1       | 1          | 2          | Java       | Excellent |
| 2       | 2          | 3          | Hibernate  | Awesome   |

So, only the records that match the ON clause condition are included in the query result set. In our case, the result set contains all the post along with their post_comment records. The post rows that have no associated post_comment are excluded since they can not satisfy the ON Clause condition.

Again, the above SQL INNER JOIN query is equivalent to the following CROSS JOIN query:

SELECT
   p.id AS "p.id",
   pc.post_id AS "pc.post_id",
   pc.id AS "pc.id",
   p.title AS "p.title",
   pc.review  AS "pc.review"
FROM post p, post_comment pc
WHERE pc.post_id = p.id

The non-struck rows are the ones that satisfy the WHERE clause, and only these records are going to be included in the result set. That's the best way to visualize how the INNER JOIN clause works.

| p.id | pc.post_id | pc.id | p.title   | pc.review |
|------|------------|-------|-----------|-----------|
| 1    | 1          | 1     | Java      | Good      |
| 1    | 1          | 2     | Java      | Excellent |
| 1    | 2          | 3     | Java      | Awesome   |
| 2    | 1          | 1     | Hibernate | Good      |
| 2    | 1          | 2     | Hibernate | Excellent |
| 2    | 2          | 3     | Hibernate | Awesome   |
| 3    | 1          | 1     | JPA       | Good      |
| 3    | 1          | 2     | JPA       | Excellent |
| 3    | 2          | 3     | JPA       | Awesome   |

Not that this only applies to INNER JOIN, not for OUTER JOIN.

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911