Does PostgreSQL always implement FROM ... WHERE in the same way as FROM ... JOIN ... ON ...?

Question

In PostgreSQL document, there are two examples:

EXPLAIN SELECT *
FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Hash Join  (cost=230.47..713.98 rows=101 width=488)
   Hash Cond: (t2.unique2 = t1.unique2)
   ->  Seq Scan on tenk2 t2  (cost=0.00..445.00 rows=10000 width=244)
   ->  Hash  (cost=229.20..229.20 rows=101 width=244)
         ->  Bitmap Heap Scan on tenk1 t1  (cost=5.07..229.20 rows=101 width=244)
               Recheck Cond: (unique1 < 100)
               ->  Bitmap Index Scan on tenk1_unique1 
 (cost=0.00..5.04 rows=101 width=0)
                     Index Cond: (unique1 < 100)

and

EXPLAIN SELECT *
FROM tenk1 t1, onek t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Merge Join  (cost=198.11..268.19 rows=10 width=488)
   Merge Cond: (t1.unique2 = t2.unique2)
   ->  Index Scan using tenk1_unique2 on tenk1 t1  (cost=0.29..656.28 rows=101 width=244)
         Filter: (unique1 < 100)
   ->  Sort  (cost=197.83..200.33 rows=1000 width=244)
         Sort Key: t2.unique2
         ->  Seq Scan on onek t2  (cost=0.00..148.00 rows=1000 width=244)

Hash join and merge join are ways to implement joins (explicitly in SQL, FROM ... JOIN ... ON ... ).

Does PostgreSQL implement FROM ... WHERE in the above examples in the same way as FROM ... JOIN ... ON ...? Does it always do so?

Generally in PostgreSQL, I was wondering when to use FROM ... JOIN ... ON ... and when to use ... FROM ... WHERE ..., when both seem equivalent effectively?

Thanks.

I read some similar posts for other RDBMS such as INNER JOIN ON vs WHERE clause, SQL join: where clause vs. on clause, WHERE Clause vs ON when using JOIN, and Filtering JOINs: WHERE vs. ON. Most of them talk about

human readability,
inner join vs outer join, also see In outer join, where does a plain filter condition come from?
FROM ... WHERE ... will implement WHERE selection on the result of FROM cartesian product, while FROM ... JOIN ... ON ... will implement ON selection during creating cartesian product.

*Always* use proper, explicit, *standard* `JOIN` syntax. That is the right way to write a `JOIN` query. Commas were used in the earliest versions of SQL (think 1980s). We have progressed a lot since then. — Gordon Linoff, Jun 22 '18 at 12:25
Thanks. Why "Always use proper, explicit, standard JOIN syntax"? The examples are from the latest postgresql document in 2018 as far as commas are concerned. — Tim, Jun 22 '18 at 12:38
. . Unfortunately, code in documentation gets updated about once per century. `JOIN` is simply more powerful, because it supports different types of joins and convenient because it allows the `USING` clause (forget `NATURAL JOIN`, it is a bad construct). Most people find it more readable as well, because the conditions are next to the tables references, so the conditions are easy to validate. — Gordon Linoff, Jun 22 '18 at 12:43
@Tim, `natural join` is considered bad because it is not explicit. database schema changes and you may end up with additional columns with the same name on either side of join, changing the behaviour completely & silently. — Haleemur Ali, Jun 22 '18 at 14:00
@HaleemurAli "end up with additional columns with the same name on either side of join", but natural join eliminates duplicate columns. I don't know what you mean. — Tim, Jun 22 '18 at 14:03
@Tim Consider tables `order` and `orderline`, with only `orderid` as the matching column. Now a natural join will do what you want. Later on, someone decides that for performance reasons, `order` needs an extra column `amount`. `orderline` already had a column by that name. The existing query, the one using the natural join, is now broken. It will still run without any error messages, but it will no longer do what was intended. — , Jun 22 '18 at 14:06
consider two tables having initial schema `t1 (a, b, c, d)` & `t2 (a, b, x, y)`. over time the app developers realize they want `t2` to have an additional column and decide to give it the name `c`, unaware that there is a natural join lurking around. can you guess what happens in this scenario? — Haleemur Ali, Jun 22 '18 at 14:06
natural join uses the columns with the same name, not a declared foreign key. http://rextester.com/TZF86758 — , Jun 22 '18 at 14:09
Should be called an "Assumption Join" with an abbreviated synonym of "Ass Join". I kid, I kid. Kind of. — JNevill, Jun 22 '18 at 14:41
Please ask one question per post. The answer to your title depends on the details of a particular version of the optimizer. Because it's *implemention* not specified behaviour. But the general principles are chapters of textbooks & documentation. Read some. Anyway this is a (too-broad) SO faq. Whether a test is in on or where is immaterial for inner join. See [this](https://stackoverflow.com/a/25957600/3404097). Your natural join question is also a SO faq. But see [this](https://stackoverflow.com/a/35034568/3404097). — philipxy, Jun 23 '18 at 02:40
My current generic comment re "better"/"best" etc: There's no such thing as "better"/"best" in engineering unless *you* define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". https://meta.stackexchange.com/q/204461 — philipxy, Jun 23 '18 at 02:40

arnt · Answer 1 · 2018-07-13T12:36:33.613

If two queries mean the same, then postgresql tries to execute them in the same way. However, that doesn't always work. It's the goal, but it doesn't work always, mostly because the planner doesn't realise that the two necessarily at the same. There may be deficiencies in your database schema, and of course there may be bugs or insufficient insight in the query planner.

Rewriting a heavily used query to an equivalent form that executes better is quite common. (One of the reasons that I personally like postgresql so much is that the rewritten query is usually a better expression of my intent than my original query was.)

The answer to your question may be summed up as "yes in theory and often in practice".

Does PostgreSQL always implement FROM ... WHERE in the same way as FROM ... JOIN ... ON ...?

1 Answers1