Are there any reasons why SQL Server allows JOIN conditions that don't depend on the joined table at all?

Question

CREATE TABLE y (Id  INT NOT NULL PRIMARY KEY      );
CREATE TABLE x (YId INT NOT NULL REFERENCES y (Id));
DECLARE @YId INT = …;

If I wanted to select all records in x where YId is equal to @YId, I could to this three different ways:

/* 1: */  SELECT x.* FROM x                              WHERE x.YId = @YId;
/* 2: */  SELECT x.* FROM x INNER JOIN y ON                    x.YId = @YId;
/* 3: */  SELECT x.* FROM x INNER JOIN y ON x.YId = y.Id WHERE y.Id  = @YId;

Query 1 is obviously the most straight-forward query, and the one with the best performance. (See the execution plan further below.) Query 3 might be more complicated than strictly necessary, but is perhaps also a fairly common solution.

What surprises me is query 2 (which, according to the execution plan, gets executed just like query 3): Its INNER JOIN clause does not depend on y at all!

Question: Why does SQL Server even accept this seemingly nonsensical join condition as valid? Are there any specific reasons or situations that explain why the ON condition is not required to refer to the joined table?

Execution plans:

Execution plans for queries 1, 2, and 3

Btw. I wasn't sure if this would be more suitable for [dba.stackexchange.com](http://dba.stackexchange.com); if so, please don't hesitate to migrate it. — stakx - no longer contributing, Sep 17 '14 at 11:41
The three queries here are definitely not equivalent. Depending on the data they could all give different results. So comparing the exectution plans isn't really very informative. — nvogel, Sep 17 '14 at 11:55
@tschmit007: I'm not sure anymore, now that you're asking. (Would the queries be guaranteed to be equivalent if I changed them all to `SELECT DISTINCT …`?) — stakx - no longer contributing, Sep 17 '14 at 12:03
the join clause (implicit or explicit, cartesian by default) will build the tables product on which tou will apply the where clause. A distinct will may be allow to get the same result, but the number of explored rows will vary a lot. 1/ all rows from table X, 2/ all row from table X where cond * all rows from table y, 3/ all rows from table X related to table Y. So yes, execution plans are differents. — tschmit007, Sep 17 '14 at 12:08
@tschmit007: What kind of "equivalence" are we talking about here? Are we talking just about the final results of these queries, or also about the execution plans? I am aware that my three queries are not executed in the same fashion, and that one is likely more efficient than the other two. But do these queries actually give the same final results? I thought so. — stakx - no longer contributing, Sep 17 '14 at 12:11
imho, they may be give the same answer, but not to the same question. :) — tschmit007, Sep 17 '14 at 12:13
@tschmit007: Are *you* sure about that? I can see that if the 3 queries are likened to questions, they are obviously not identical, but I thought they are equivalent (in the sense that they must necessarily lead to the same answers, perhaps except for the `DISTINCT` issue). — stakx - no longer contributing, Sep 17 '14 at 12:16
@stakx, they won't necessarily give the same results. 1 is self-explanatory. 2 depends on there being at least one row in Y. 3 depends on there being a matching Id row in Y and will return rows from X that have a different Id. They are all very different queries in both semantics and execution. — nvogel, Sep 17 '14 at 12:16
@sqlvogel: I don't understand. Q2 does *not* depend on `Y` having at least one row. If `Y` does not contain any records, then the join will simply lead to an empty set, i.e. no results. Same happens with Q1, but perhaps due to different internal execution. And why should Q3 return records from X with a *different* ID? How is that possible when all I'm doing is an equi-join and an equality comparison? (Again: I'm aware that SQL Server might internally have to look at rows that won't be in the final result. Are we really talking only about the final query result?) — stakx - no longer contributing, Sep 17 '14 at 12:22
@stkx, I misread the join clauses. I believe they are equivalent assuming the foreign key is enforced. That doesn't answer your question but perhaps my answer below does. — nvogel, Sep 17 '14 at 12:42
What makes you think the join is "nonsensical"? (You seem to have some misconceptions about JOIN, ON and WHERE, and for that matter REFERENCES.) (See [this](https://stackoverflow.com/a/24749002/3404097) and [this](https://stackoverflow.com/a/24909854/3404097).) — philipxy, Sep 19 '14 at 08:05
Re misconceptions: All joins have a straightforward meaning, so saying one is "nonsensical" is nonsensical. ON and WHERE "mean" AND, although ON binds tighter; since you expect the ON to be required to mention both tables, you have some other mental model. If I get a chance I'll write a (concise) answer specific to your question. Ultimately though: The DMBS optimizer just isn't smart enough. — philipxy, Sep 19 '14 at 08:41
@philipxy: All joins might have a straightforward *technical* meaning and definition of what exactly they do. When I said "seemingly nonsensical", I referred to what the join is trying to accomplish WRT the end result: from that perspective, it appears to be completely redundant (except that it might duplicate rows; I really should have included `DISTINCT` in the queries, because that's what I intended all along). The intention of that join is actually a simple `WHERE`, and it's "nonsensical" (even wrong) to make a simple `WHERE` appear more complicated by hiding it behind a redundant join. — stakx - no longer contributing, Sep 19 '14 at 08:49
By "mean" I too mean "WRT the end result". (Given the FK:) I agree 2 is redundant. 3 is *more* redundant. Since the sense is the same, I wouldn't expect "nonsensical". But if so 3 is also nonsensical. — philipxy, Sep 19 '14 at 09:02

Pred · Accepted Answer · 2014-09-17T13:18:06.823

SQL Server enables it, because the JOIN syntax is the following (from MSDN):

<joined_table> ::= 
{
    <table_source> <join_type> <table_source> ON <search_condition> 
    | <table_source> CROSS JOIN <table_source> 
    | left_table_source { CROSS | OUTER } APPLY right_table_source 
    | [ ( ] <joined_table> [ ) ] 
}
<join_type> ::= 
    [ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
    JOIN

Where the <search_condition> is the following:

Specifies the condition on which the join is based. The condition can specify any predicate, although columns and comparison operators are frequently used.

When the condition specifies columns, the columns do not have to have the same name or same data type; however, if the data types are not the same, they must be either compatible or types that SQL Server can implicitly convert. If the data types cannot be implicitly converted, the condition must explicitly convert the data type by using the CONVERT function.

There can be predicates that involve only one of the joined tables in the ON clause. Such predicates also can be in the WHERE clause in the query. Although the placement of such predicates does not make a difference for INNER joins, they might cause a different result when OUTER joins are involved. This is because the predicates in the ON clause are applied to the table before the join, whereas the WHERE clause is semantically applied to the result of the join.

For more information about search conditions and predicates, see Search Condition (Transact-SQL).

The important part is highlighted with bold.

Here is an example:

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN dbo.Privileges P
  ON U.UserID = P.UserID AND P.PrivilegeName = 'EditComment'

This query lists all users and adds the status of the 'EditComment' privilege. If you move the condition to the WHERE clause, the result is different. (Only users with the 'EditComment' privilege will be listed.)

The above query will list all users with NULL in the Status and PrivilegeName columns in the result if there is no matching row.

The same result could be achived with a subquery:

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN (SELECT UserID, Status FROM dbo.Privileges WHERE P.PrivilegeName = 'EditComment') P
  ON U.UserID = P.UserID

So this is some kind of shortand of the subquery approach.

Moving the condition to the WHERE clause

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN dbo.Privileges P
  ON U.UserID = P.UserID
WHERE
  P.PrivilegeName = 'EditComment'

If the condition is in the WHERE clause, the result will be filtered to that rows which has a mathing row in the Privileges table and that row's PrivilegeName is EditComment. This is basically an INNER JOIN. (Filtering to a column in the WHERE clause which is in a LEFT JOINed table filters all NULL values from the result - except if there is an OR condition with IS NULL)

Execution Plans

The difference in the execution plans are because of the logical processing of the SELECT queries. FROM, ON, JOIN, WHERE, GROUP BY, WITH CUBE or WITH ROLLUP, HAVING, SELECT, DISTINCT, ORDER BY, TOP

JOIN is processed before WHERE.

Change the INNER JOIN in your query to LEFT or any OUTER join and analyze the query. The results could be different.

EDIT

The second query looks like a filtered cross-join:

/* 4: */ SELECT x.* FROM x INNER JOIN y ON                    1=1;
/* 5: */ SELECT x.* FROM x CROSS JOIN y

When you change the ON condition in the 4th query to your original one in the 2nd query, that equals to

/* 6: */ SELECT x.* FROM x CROSS JOIN y WHERE x.YId = @YId

Why does SQL Server even accept this seemingly nonsensical join condition as valid?

Because it is valid (it has a different meaning, but syntactically it is valid)

Are there any specific reasons or situations that explain why the ON condition is not required to refer to the joined table?

Reason? yes, it is syntactically valid

Situation: CROSS JOIN with filtering for example. Join 1 row from a table to all records in the resultset (e.g. the record of the current date from a point-in-time/date/calendar table for reporting purposes)

+1 for such a detailed answer. The documentation citation is a good hint. However I don't quite understand two things about your example. First, your (composite) `ON` condition refers to both tables, so it doesn't seem quite the same thing I'm enquiring about. Why does this example still fit my question? Second, *how exactly* would the result differ if you moved the `P.PrivilegeName = '…'` to a final `WHERE` clause? — stakx - no longer contributing, Sep 17 '14 at 12:06
Yes, this is a bit different example to demonstrate the possible usages of the single table predicates in on conditions. Just a sec and I'll update the answer with the explanation of the `WHERE` clause. — Pred, Sep 17 '14 at 12:08
@stakx I've just added some additinal info (to the end of the answer) — Pred, Sep 17 '14 at 12:38

score 1 · Answer 2 · answered Sep 17 '14 at 12:36

trying to answer, shortly, and repeating @Pred answer

I think the answer is, imho, yes. It allows shorcut for subquery.

2 / SELECT x.* FROM x INNER JOIN y ON                    x.YId = @YId

is logically equivalent to

select
    tx.*
from
    (
        select x.* from x where x.YId = @YId
    ) tx
    cross join y

this allows to reduce the number of explored rows from x * y.

as an illustration :

declare @ty table (Id int not null primary key)
declare @tx table (Id Char(1) not null primary key, yId int not null)

insert into @ty values (1) ,(2), (3)
insert into @tx values ('A', 1), ('B', 1), ('C', 2)

declare @YId int = 1

SELECT x.* FROM @tx x                             WHERE x.YId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON   x.yId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON x.YId = y.Id WHERE y.Id  = @YId

will give

yes a distinct will give the same results.

On the other end: why a join if you neither need to return or test from the joined table ?

Isn't the real question, or a reformulated question: why the query optimizer isn't more smart ?

Thanks for the example, and confirming that `DISTINCT` would indeed make all three queries return the same result. — stakx - no longer contributing, Sep 17 '14 at 15:17
Regarding why the query optimizer isn't smarter, I intentionally avoided asking the question that way because the answer would have been, "Because it isn't." (As Eric Lippert likes to point out about the C# language, features are not implemented *by default*, i.e. unless a deliberate effort is undertaken to implement them.) I figured it would be more worthwile asking the other way around, i.e. if there are reasons why this behaviour could actually make sense. — stakx - no longer contributing, Sep 17 '14 at 15:22

score 0 · Answer 3 · answered Sep 17 '14 at 12:03

0

In an nutshell the answer is that ISO Standard SQL is a strange beast. Microsoft is just following the standard that the IT industry has stuck doggedly to for the last 3 decades.

answered Sep 17 '14 at 12:03

nvogel

24,981
1
44
82

score 0 · Answer 4 · answered Sep 17 '14 at 12:39

SQL Server allows JOIN conditions that don't depend on the joined table at all

That's for full cross join :

insert into y values(1),(2);
insert into x values(1),(2);

Result of query 2:

YID
1
1

Query 1 and 3 must get the same result, but sql server is not clever enough to recognize it. That's why DBAs living for.

Are there any reasons why SQL Server allows JOIN conditions that don't depend on the joined table at all?

4 Answers4

Linked