-2

Here is my table structure: SQL Fiddle

CREATE TABLE mytable (
    id int,
    related int
);

INSERT into mytable VALUES(1, NULL);
INSERT into mytable VALUES(2, 1);
INSERT into mytable VALUES(3, 1);

And I have two queries:

-- returns 3 rows
SELECT t1.id, t2.id
FROM mytable as t1 
LEFT JOIN mytable as t2 ON t1.related = t2.id;

-- returns 4 rows
SELECT t1.id, t2.id
FROM mytable as t1 
LEFT JOIN mytable as t2 ON t1.id = t2.related;

Those queries are almost similar and that's a self-join. But as you can see in the fiddle, the first query returns 3 rows and the second one returns 4 rows. That not what I've expected. I think the result should be exactly the same.

Can you please explain me how LEFT JOIN works in the background?

Joe
  • 616
  • 2
  • 12
  • 27
Martin AJ
  • 6,261
  • 8
  • 53
  • 111
  • `Id` 1 appears twice in `related`. However `related` 1 has only 1 `id` it can join to. Sometimes it helps me to draw the tables out manually, in Excel for example, and manually apply the join (by dragging the cells around). You can also expand the select clause to include all rows. That should highlight what's happening here. – David Rushton Jan 30 '18 at 09:36
  • 1
    There's a really good illustration of how joins work here. Really worth a look https://stackoverflow.com/a/14011879/7858451 – Joe Jan 30 '18 at 09:43
  • Possible duplicate of [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – philipxy Jan 30 '18 at 10:20
  • Left join returns inner join rows plus unmatched left table rows extended by nulls. It's not clear how you could think those must return the same result (ignoring column order) since the left tables use different columns so can match different values/rows. See the duplicate link. Be sure to read all my comments there about the bad answers & Venn nonsense. (Also at Joe J's link.) – philipxy Jan 30 '18 at 10:30
  • 1
    Perhaps you should start by explaining what You -think- a `LEFT JOIN` should be doing. Then we can explain where you have some misconceptions. – MatBailie Jan 30 '18 at 10:36

5 Answers5

1

LEFT JOIN means to grab all the rows from the left table, and only if there is a match from the right table, to return those. If there is no match, NULL is returned.

Let's look at what the dataset looks like when joined to itself with no condition. (Note, the asterisks and pluses are referred to below.)

 +-------+------------+-------+------------+
 | t1.id | t1.related | t2.id | t2.related |
 +-------+------------+-------+------------+
 |     1 | NULL       |     1 | NULL       |
+|     1 | NULL       |     2 | 1          |
+|     1 | NULL       |     4 | 1          |
*|     2 | 1          |     1 | NULL       |
 |     2 | 1          |     2 | 1          |
 |     2 | 1          |     4 | 1          |
*|     4 | 1          |     1 | NULL       |
 |     4 | 1          |     2 | 1          |
 |     4 | 1          |     4 | 1          |
 +-------+------------+-------+------------+

The left table is t1 in both cases.

In the first query, we look for matches of t1.related = t2.id (the middle two columns). That corresponds to the rows marked with an asterisk. However, there was no match for t.id = 1, so we need to include this, but with NULL because that's what LEFT JOIN means (no match still returns NULL).

+-------+-------+
| t1.id | t2.id |
+-------+-------+
|     1 | NULL  | added because LEFT JOIN
|     2 |    1  | marked with * in table above
|     4 |    1  | marked with * in table above
+-------+-------+

In the second query, we look for matches of t1.id = t2.related (the outer two columns). That corresponds to the rows marked with a plus. However, there was no match for t.id = 2 and t.id = 4, so we need to include these, but with NULL because that's what LEFT JOIN means (no match still returns NULL).

+-------+-------+
| t1.id | t2.id |
+-------+-------+
|     1 |    2  | marked with + in table above
|     1 |    4  | marked with + in table above
|     2 | NULL  | added because LEFT JOIN
|     4 | NULL  | added because LEFT JOIN
+-------+-------+
sunil
  • 556
  • 6
  • 20
1

First query: t1.related = t2.id

t1              joined t2
id   related  | id   related
--------------+-------------
1    NULL     | --   --
2    1        | 1    NULL
3    1        | 1    NULL

An inner join would result in only two rows, but the outer join also preserves the first row that has no match.

Second query: t1.id = t2.related

t1              joined t2
id   related  | id   related
--------------+-------------
1    NULL     | 2    1
1    NULL     | 3    1
2    1        | --   --
3    1        | --   --

Here too, an inner join would result in only two rows, but the outer join also preserves the two rows that have no match.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
0

In first case, we need to match values 1, 2, 3 with NULL, 1 and 1. Since it is left join, NULL will stay with no match and 1's will be matched with 1 from other table, thus 3 records.

In the second case, we have values 1, 2, 3. 2 and 3 have no match and will result in two rows, but 1 has 2 matches and will result in 2 additional rows, which is 4 rows.

Generally, having:

... LeftTable [LT] left join RightTable [RT] on [LT].[joinCol] = [RT].pjoinCol] ...

will work like this:

take all values from LT.joinCol, try to match with values in RT.joinCol. If some value has n matches in RT.joinCol, then it will result in n rows. If the row has no match, it will still result in one, un-matched record.

In your 1st case, 2 values have 1 match => 1 + 1 = 2 records. One value has no match => 1 record, 2 + 1 = 3.

In your 2nd case, 2 values have no match => thus 2 records, one value has 2 matches => 2 records, 2 + 2 = 4 :)

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
0

The best way to view self join is create two tables and then view the joining conditions.

Table t1

    Id   Related  
    1       null  
    2       1  
    4       1  

Table t2

    Id   Related  
    1    null  
    2    1  
    4    1  

Note: Left Join means every thing from left table will come even if joining condition does not match. From the right table it will come as null.

First Query: t1.related = t2.id; (Columns selected "t1.id, t2.id")

1.) Lets take first row from t1 table and related column has null value. null has not match in id column of t2 table. As it is a left join, row will come from t1 table.

First Row:

   t1_id t2_id  
    1     null  

2.) Lets take second row from t1 table and related column has 1. 1 has one match in id column of t2 table. So one row comes in join condition.

Second Row:

    t1_id t2_id  
    2     1   

3.) Lets take third row from t1 table and related column has 1. 1 has one match in id column of t2 table. So one row comes in join condition.

Third Row:

    t1_id t2_id  
     4    1  

Second Query t1.id = t2.related (Columns selected "t1.id, t2.id")

1.) Lets take first row from t1 table and id column has 1. 1 has 2 rows in related column of t2 table. so two rows are selected.

   t1.id   t2.id  
    1        2  
    1        4

2.) Lets take second row from t1 table and id column has 2. 2 has 0 row in related column of t2 table. But it a left join row will come fro t1 table.

    t1.id   t2.id  
     1       2  
     1       4  
     2      null  

2.) Lets take third row from t1 table and id column has 4. 4 has 0 row in related column of t2 table. But it a left join row will come fro t1 table.

    t1.id   t2.id  
     1        2  
     1        4  
     2       null   
     4       null

Hope this will make you understand.

Thanks Ankit.

sda11
  • 61
  • 1
  • 6
Ankit Mongia
  • 200
  • 2
  • 11
0

LEFT JOIN return all tuple from left table even if no matches in right table plus the matched value of right table.

    SELECT t1.id as t1_id, t1.related as t1_r, t2.id as t2_id, t2.related as t2_r
    FROM mytable as t1 
    LEFT JOIN mytable as t2 ON t1.related = t2.id;

returns

 t1_id  t1_r    t2_id   t2_r
 ----------------------------
   1    null    null    null
   2     1        1     null
   4     1        1     null

(1, null) tuple of t1 matches no tuple from t2, (2, 1) tuple of t1 matches one tuple of t2 (1, null), so as (4, 1) hence 3 rows in result

Where as

   SELECT t1.id as t1_id, t1.related as t1_r, t2.id as t2_id, t2.related as t2_r
   FROM test1 as t1 
   LEFT JOIN test1 as t2 ON t1.id  = t2.related ;

returns

   t1_id    t1_r    t2_id   t2_r
   -----------------------------
       1    null       2    1
       1    null       4    1
       2      1     null    null
       4      1     null    null

here (1, null) of t1 matches two tuple of t2 (2, 1) and (4, 1) and (2, 1) and (4, 1) matches no tuple hence 4 rows

sda11
  • 61
  • 1
  • 6