7

Why are the following queries different? I want a LEFT OUTER join, but need to filter the children with a condition. I thought these queries were essentially the same (just different syntax), but I get different results if I put the condition in ON versus WHERE:

-- Query 1: Filter in WHERE
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID)
WHERE   c.ID IS NULL OR c.Name = 'T';

-- Query 2: Filter in ON
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID AND c.Name = 'T');

I started with Query 2 but it showed all of the parents in the results, not the subset with matching children, so I switched to Query 1. Here is an example:

DECLARE @Parent TABLE (
     ID           int           IDENTITY(1, 1) PRIMARY KEY
  ,  Name         nvarchar(40)  NOT NULL
);

DECLARE @Child TABLE (
     ID           int           IDENTITY(1, 1) PRIMARY KEY
  ,  Name         nvarchar(40)  NOT NULL
  ,  ParentID     int               NULL
);

-- Parents
INSERT  @Parent (Name)
VALUES  ('A'), ('B'), ('C'), ('D')
;

-- Children: permutations to parents.
-- NOTE: 'D' has no children
INSERT  @Child (Name, ParentID)
VALUES  ('T', 1)
    ,             ('U', 2)
    ,   ('V', 1), ('V', 2)
    ,                       ('W', 3)
    ,   ('X', 1),           ('X', 3)
    ,             ('Y', 2), ('Y', 3)
    ,   ('Z', 1), ('Z', 2), ('Z', 3)
;

-- Query 1: Filter in WHERE
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID)
WHERE   c.ID IS NULL OR c.Name = 'T';

-- Query 2: Filter in ON
SELECT  p.ID, p.Name, c.ID, c.Name, c.ParentID
FROM    @Parent p
  LEFT OUTER JOIN @Child c
    ON (p.ID = c.ParentID AND c.Name = 'T');

Query 1: Results

ID Name ID Name ParentID
1 A 1 T 1
4 D NULL NULL NULL

Query 2: Results

ID Name ID Name ParentID
1 A 1 T 1
2 B NULL NULL NULL
3 C NULL NULL NULL
4 D NULL NULL NULL

I assumed the queries would return the same results and I was surprised when they didn't. I prefer the style of query 2 (and I think it is more optimal), but I thought the queries would return the same results.

(NOTE: The SQL example with data was added much later for clarification as to why this question is not a duplicate of another question, and to bring it up to current question standards. The sample results make it much clearer that Query 1 returns the parents with 1 or more matching children and parents with no children. Query 2 returns all parents but only matching children. Obviously I understand the difference between the queries now.)

Edit/Summary:

There were some great answers provided here. I had a hard time choosing to whom to award the answer. I decided to go with mdma since it was the first answer and one of the clearest. Based on the supplied answers, here is my summary:

Possible results:

  • A: Parent with no children
  • B: Parents with children
  • |-> B1: Parents with children where no child matches the filter
  • \-> B2: Parents with children where 1 or more match the filter

Query results:

  • Query 1 returns (A, B2)
  • Query 2 returns (A, B1, B2)

Query 2 always returns a parent because of the left join. In query 1, the WHERE clause is performed after the left join, so parents with children where none of the children match the filter are excluded (case B1).

Note: only parent information is returned in case B1, and in case B2 only the parent/child information matching the filter is returned.

HLGEM provided a good link (now dead, so using archive.org):

https://web.archive.org/web/20180814131549/http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN

Ryan
  • 7,835
  • 2
  • 29
  • 36
  • Does this answer your question? [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – philipxy Mar 04 '21 at 11:40
  • @philipxy No, that question is about the difference between INNER/OUTER joins. This question is about putting the outer join filtering criteria in `ON` versus `WHERE` and how if affects the results. That question does not answer this question. – Ryan Mar 08 '21 at 14:23
  • Read the 1st sentence of your post. Read the 1st bullet. If you expect the position of where vs on to not matter then you don't understand left join. PS Please ask 1 question per post. PS If you get unexpected results, chop your code to the smallest possible with that error, which means first chop it until it gives expected results then add minimal back to get the problem, and say what you expected instead & why, justified by reference to authoritative documentation. Code questions require a [mre]. (Basic debugging.) – philipxy Mar 08 '21 at 14:57
  • I see now this was posted a decade ago. So presumably you would ask a better question nowadays. It was still a duplicate when posted. – philipxy Mar 08 '21 at 15:00
  • @philipxy Yes, this was more than a decade ago... I am old enough that they still taught old style SQL joins in school: `WHERE Parent.ID *= child.ParentID`. I was wondering why I was getting different results in ON vs WHERE because I thought they were equivalent and just syntax difference, but obviously they are not. I tried to add an example today in SqlFiddle, but it was not working. I'll edit the question to make it better to today's standards. – Ryan Mar 08 '21 at 17:46
  • 1
    @philipxy Your tone seems very aggressive for a question asked more than a decade ago. Plus quoting the minimal reproducible example... That didn't even exist for many years of SO. Much has changed since this question was asked and the beginning of SO. – Ryan Mar 08 '21 at 18:10
  • @philipxy I still maintain this is not a duplicate of the question you posted. The answer by Martin Smith (https://stackoverflow.com/a/27458534/29762) does cover filtering in ON versus WHERE, but many of the other answers do not, and that answer is not even the accepted answer. However that question is very broad (it would never be allowed today) and doesn't *directly* address the specifics of my question. – Ryan Mar 08 '21 at 20:02

7 Answers7

12

Yes, there is a huge difference. When you place filters in the ON clause on a LEFT JOIN, the filter is applied before the results are joined to the outer table. When you apply a filter in the WHERE clause, it happens after the LEFT JOIN has been applied.

In short, the first query will exclude rows where there are child rows but the child description is not equal to the filter condition, whereas the second query will always return a row for the parent.

Thomas
  • 63,911
  • 12
  • 95
  • 141
  • Thanks. I understand the difference in terms of performance, but I am more curious about the differing resultsets. – Ryan May 21 '10 at 15:09
  • @Ryan - It is not about performance. Where the filtering is applied can make all the difference in terms of the proper resultset. – Thomas May 21 '10 at 15:21
9

The first query will return cases where the parent has no children or where some of the children match the filter condition. Specificaly, cases where the parent has one child, but it doesn't match the filter condition will be omitted.

The second query will return a row for all parents. If there is no match on filter condition, a NULL will be returned for all of c's columns. This is why you are getting more rows in query 2 - parents with children that don't match the filter condition are output with NULL child values, where in the first query they are filtered out.

mdma
  • 56,943
  • 12
  • 94
  • 128
3

Putting the condition in the where clause converts it to an inner join (unless you are using something where where id is null which gives you records not inthe table) See this for a fuller explanation:

http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 1
    +1, though this of course depends on the condition. Say, placing a condition like `id IS NULL` converts a `LEFT JOIN` to a `NOT EXISTS` (and most engines even optimize it correctly). Any equality or inequality would of course convert it into an inner join. – Quassnoi May 21 '10 at 15:40
2

For this recordset:

parent

id
1

child

id    parent filter
1     1      OtherCondition
2     1      OtherCondition

, the first query would return 0 records, while the second one would return 1 record:

WITH    parent (id) AS
        (
        SELECT  1
        ),
        child (id, parent, condition) AS
        (
        SELECT  1, 1, 'OtherCondition'
        UNION ALL
        SELECT  2, 1, 'OtherCondition'
        )
SELECT  *
FROM    parent
LEFT JOIN
        child
ON      child.parent = parent.id   

/* The children are found, so no fake NULL records returned */

1   1   1   OtherCondition
1   2   1   OtherCondition

Now adding WHERE clause:

WITH    parent (id) AS
        (
        SELECT  1
        ),
        child (id, parent, condition) AS
        (
        SELECT  1, 1, 'OtherCondition'
        UNION ALL
        SELECT  2, 1, 'OtherCondition'
        )
SELECT  *
FROM    parent
LEFT JOIN
        child
ON      child.parent = parent.id       
WHERE   child.id IS NULL OR child.condition = 'FilterCondition'

WHERE clause filters the records returned on the previous step and no record matches the condition.

While this one:

WITH    parent (id) AS
        (
        SELECT  1
        ),
        child (id, parent, condition) AS
        (
        SELECT  1, 1, 'OtherCondition'
        UNION ALL
        SELECT  2, 1, 'OtherCondition'
        )
SELECT  *
FROM    parent
LEFT JOIN
        child
ON      child.parent = parent.id       
        AND child.condition = 'FilterCondition'

1   NULL    NULL    NULL

returns a single fake record.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Great detailed example. Very nice. I don't know if I would call it a 'fake' record though since it is a left join and you would expect the outer table columns be NULL at times. I would probably say query 2 always returns a parent while query 1 returns parents that either have no children or parents with children that explicitly match the filter condition. Many thanks for the example. – Ryan May 21 '10 at 15:44
1

I notice couple of differences that can make the results vary.In the first query, you have LEFT OUTER JOIN Child c ON (p.ID = c.ParentID) and then in the second query you have LEFT OUTER JOIN Child c ON (p.ID = c.ParentID AND c.Description = 'FilterCondition') and this makes the second query return all parents with children satisfying your condition where as the first condition will also return the parents wit no children. Also look at the precedence of join conditions and where conditions.

Srikar Doddi
  • 15,499
  • 15
  • 65
  • 106
1

the parents that only have children with description != 'FilterCondition' won't appear in query 1 because the WHERE clause is evaluated after the rows are joined.

Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
1

The first query returns fewer rows because it only returns rows that either don't have children, or have children that match the filter condition.

The WHERE clause excludes the rest (those that DO have children but don't match the filter condition.)

The 2nd query shows all three condition above.

LesterDove
  • 3,014
  • 1
  • 23
  • 24