1

In the following case:

CREATE TABLE Persons (
    groupId int,
    age int,
    Person varchar(255)
);

insert into Persons (Person, groupId, age) values('Bob'  , 1     , 32);
insert into Persons (Person, groupId, age) values('Jill'  , 1     , 34);
insert into Persons (Person, groupId, age)values('Shawn'  , 1     , 42);
insert into Persons (Person, groupId, age) values('Shawn'  , 1     , 42);
insert into Persons (Person, groupId, age) values('Jake'  , 2     , 29);
insert into Persons (Person, groupId, age) values('Paul'  , 2     , 36);
insert into Persons (Person, groupId, age) values('Laura'  , 2     , 39);

The following query:

SELECT *
FROM `Persons` o                    
  LEFT JOIN `Persons` b             
      ON o.groupId = b.groupId AND o.age < b.age

returns (executed in http://sqlfiddle.com/#!9/cae8023/5):

1   32  Bob 1   34  Jill
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   42  Shawn   (null)  (null)  (null)
1   42  Shawn   (null)  (null)  (null)
2   29  Jake    2   36  Paul
2   29  Jake    2   39  Laura
2   36  Paul    2   39  Laura
2   39  Laura   (null)  (null)  (null). 

I don't understand the result.
I was expecting

1   32  Bob 1   34  Jill
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   42  Shawn   (null)  (null)  (null)
2   29  Jake    2   36  Paul
2   29  Jake    2   39  Laura
2   39  Laura   (null)  (null)  (null)

Reason I was expecting that is that in my understanding the left join picks each row from the left table, tries to match it each row of the right table and if there is a match it adds the row. If there is no match in the condition it adds the left row with null values for the right columns.
So if that is correct why in the fiddle output we have after 1 34 Jill 1 42 Shawn rows for Bob and Jill repeated?

Jim
  • 3,845
  • 3
  • 22
  • 47
  • A left join is still a join and not a lookup and you have Shawn twice in your data. That means that both Bob and Jill are joined to each of the "two" Shawns. – gregor Nov 03 '22 at 14:52
  • @gregor: good catch. But I am still not sure on the ordering. `Shawn` is repeated twice but shouldn't the output have `1 32 Bob 1 42 Shawn` repeated in sequence? Now the repetition is after `1 34 Jill 1 42 Shawn`. Doesn't each row on the left scan each row on the right? – Jim Nov 03 '22 at 15:21
  • Does this answer your question? [What is the difference between "INNER JOIN" and "OUTER JOIN"?](https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join) – philipxy Nov 03 '22 at 16:24
  • LEFT JOIN ON returns INNER JOIN ON rows UNION ALL unmatched left table rows extended by NULLs. RIGHT JOIN ON returns INNER JOIN ON rows UNION ALL unmatched right table rows extended by NULLs. FULL JOIN ON returns INNER JOIN ON rows UNION ALL unmatched left & right table rows extended by NULLs. Always know what INNER JOIN ON you want as part of an OUTER JOIN ON. PS It's good that you say what you expect & why you say it but you need to justify the why by reference to authoritative documentation. PS Debug question require a [mre]. – philipxy Nov 03 '22 at 16:27

2 Answers2

1

Your condition for joining rows is that the groupId is equal and o.age < b.age.

Bob's age is 32. That is less than Jill's age of 34. It is also less than Shawn's age of 42. So the condition is satisfied in two pairings of joined rows.

The joined row has all the columns from the row referenced as o and all the columns from the row referenced as b.

Note that you have entered two rows for Shawn. Bob's row actually matches Jill's row and both rows for Shawn. So you get three rows for Bob.


When I test your query on my local MySQL instance (8.0.31), I get the result in the following order, which is different from your sqlfiddle's result:

+---------+------+--------+---------+------+--------+
| groupId | age  | Person | groupId | age  | Person |
+---------+------+--------+---------+------+--------+
|       1 |   32 | Bob    |       1 |   42 | Shawn  |
|       1 |   32 | Bob    |       1 |   42 | Shawn  |
|       1 |   32 | Bob    |       1 |   34 | Jill   |
|       1 |   34 | Jill   |       1 |   42 | Shawn  |
|       1 |   34 | Jill   |       1 |   42 | Shawn  |
|       1 |   42 | Shawn  |    NULL | NULL | NULL   |
|       1 |   42 | Shawn  |    NULL | NULL | NULL   |
|       2 |   29 | Jake   |       2 |   39 | Laura  |
|       2 |   29 | Jake   |       2 |   36 | Paul   |
|       2 |   36 | Paul   |       2 |   39 | Laura  |
|       2 |   39 | Laura  |    NULL | NULL | NULL   |
+---------+------+--------+---------+------+--------+

Without an explicit ORDER BY clause, the default behavior of InnoDB is to return rows in the order they are read from the index. In this case, it's using the primary key order for both tables, because there's no other index to optimize the join. You can see that the order of columns from the left table match the primary key order.

I'm not sure how to explain why the Bob-Shawn rows are before the Bob-Jill row, because that's not primary key order for the joined table. It could be that the order is messed up in the join buffer while doing an unindexed join.

The sqlfiddle might be doing something in the client that reorders rows.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • My undestanding is that we start each row from the left table to each row to the right. We start with Bob and it matches with Jill and Shawn. Then we move to Jill which matches with Shawn. Then shouldnt we move to Shawn? – Jim Nov 03 '22 at 14:45
  • See above edit, I noticed you have two rows for Shawn in your data. – Bill Karwin Nov 03 '22 at 14:59
  • Thank you! But I am still not sure on the ordering. `Shawn` is repeated twice but shouldn't the output have `1 32 Bob 1 42 Shawn` repeated in sequence? Now the repetition is after `1 34 Jill 1 42 Shawn`. Doesn't each row on the left scan each row on the right? – Jim Nov 03 '22 at 15:21
  • When I test on my local MySQL instance, I get a different order in the result. It seems to order the result as I would expect — first three rows for Bob, then two rows for Jill, then two rows for Shawn, etc. I don't know what the sqlfiddle site is doing. In any case, if you want a specific order, then use an ORDER BY query. – Bill Karwin Nov 03 '22 at 15:24
  • I don't want a specific order, I am only trying to check my understanding of how it works. Seeing the different order made me think I dont understand the process – Jim Nov 03 '22 at 15:25
  • `it's using the primary key order` the create table does not have a primary key defined. Which one is used? – Jim Nov 03 '22 at 15:43
  • https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html _"If a table has no PRIMARY KEY or suitable UNIQUE index, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains row ID values. The rows are ordered by the row ID that InnoDB assigns. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in order of insertion."_ – Bill Karwin Nov 03 '22 at 15:52
0

You inserted the record of (Shawn) twice. Your query should be :

CREATE TABLE Persons (
    groupId int,
    age int,
    Person varchar(255)
);

insert into Persons (Person, groupId, age) values('Bob'  , 1     , 32);
insert into Persons (Person, groupId, age) values('Jill'  , 1     , 34);
insert into Persons (Person, groupId, age)values('Shawn'  , 1     , 42);
insert into Persons (Person, groupId, age) values('Jake'  , 2     , 29);
insert into Persons (Person, groupId, age) values('Paul'  , 2     , 36);
insert into Persons (Person, groupId, age) values('Laura'  , 2     , 39);

SELECT *
FROM `Persons` o                    
  LEFT JOIN `Persons` b             
      ON o.groupId = b.groupId AND o.age < b.age
;

This will gives you the following results

1   32  Bob 1   34  Jill
1   32  Bob 1   42  Shawn
1   34  Jill    1   42  Shawn
1   42  Shawn   (null)  (null)  (null)
2   29  Jake    2   36  Paul
2   29  Jake    2   39  Laura
2   36  Paul    2   39  Laura
2   39  Laura   (null)  (null)  (null)
Ehab
  • 284
  • 1
  • 9