5

See the below 3 Counts, just to give you a brief idea about the table data. All skCitizen in [dbo].[LUEducation] is present in [dbo].[LUCitizen]

SELECT  COUNT(*) FROM   [dbo].[LUCitizen] --115000 ROWS
SELECT  COUNT(*) FROM   [dbo].[LUEducation]  --201846 ROWS

SELECT  COUNT(*) --212695 ROWS
FROM    [dbo].[LUCitizen] C
LEFT JOIN   [dbo].[LUEducation] E
ON      C.skCitizen = E.skCitizen

SELECT  COUNT(*) FROM   [dbo].[LUEducation] WHERE skSchool = 24417 --4 ROWS

See the below 2 queries,

SELECT  C.skCitizen,E.skCitizen
FROM    [dbo].[LUCitizen] C
LEFT JOIN   [dbo].[LUEducation] E
ON      C.skCitizen = E.skCitizen
WHERE   E.skSchool = 24417
--4 ROWS

SELECT  C.skCitizen,E.skCitizen
FROM    [dbo].[LUCitizen] C
LEFT JOIN   (SELECT * FROM [dbo].[LUEducation] WHERE skSchool = 24417) E
ON      C.skCitizen = E.skCitizen
--115000 ROWS

In the last 2 queries, the confusing query for me is the 1st one. There i expected 115000 rows, but only 4 rows displayed. According to my understanding, Full rows from [dbo].[LUCitizen] will be displayed, then 4 rows from [dbo].[LUEducation] will be LEFT Joined.

Why is the 2 Queries different?

Pardon me , if this is a duplicate question.

HHH
  • 197
  • 1
  • 8
  • 1
    That's because the one returning 4 rows is implicitly converting into a regular `(INNER) JOIN`. After the `LEFT JOIN`, there's a bunch of rows where `e.skSchool IS NULL`; since `null <> 24417`, it throws those out.... there has to be a duplicate for this somewhere... – Clockwork-Muse Aug 25 '14 at 12:14
  • possible duplicate of [Left Join With Where Clause](http://stackoverflow.com/questions/4752455/left-join-with-where-clause) – Clockwork-Muse Aug 25 '14 at 12:17
  • 2
    @Clockwork-Muse: You're right about the rows, but it's not correct to say that `null <> 24417`. The result of `true <> 24417` is `unknown`, which is [neither `true` nor `false`](http://en.wikipedia.org/wiki/Three-valued_logic) – Andomar Aug 26 '14 at 06:22
  • @Andomar - you're right, that's a more nuanced version. – Clockwork-Muse Aug 26 '14 at 08:18

2 Answers2

8

When you do this:

SELECT  C.skCitizen,E.skCitizen
FROM    [dbo].[LUCitizen] C
LEFT JOIN   [dbo].[LUEducation] E
ON      C.skCitizen = E.skCitizen
WHERE   E.skSchool = 24417;

You are turning the left join into an inner join, because E.skSchool is NULL for non-matching rows. The correct way to put a condition on the second table in a left join is to use the on clause:

SELECT  C.skCitizen,E.skCitizen
FROM    [dbo].[LUCitizen] C
LEFT JOIN   [dbo].[LUEducation] E
ON      C.skCitizen = E.skCitizen AND E.skSchool = 24417;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
3

If the left join fails to find a match in E, columns from E receive a null value. Then the where clause:

E.skSchool = 24417

Becomes:

null = 24417

Which is not true. So it will filter out all rows.

Andomar
  • 232,371
  • 49
  • 380
  • 404