example (not using your example tables :-)
I have a car rental company.
Table car
id: integer primary key autoincrement
licence_plate: varchar
purchase_date: date
Table customer
id: integer primary key autoincrement
name: varchar
Table rental
id: integer primary key autoincrement
car_id: integer
bike_id: integer
customer_id: integer
rental_date: date
Simple right? I have 10 records for cars because I have 10 cars.
I've been running this business for 10 years, so I've got 1000 customers.
And I rent the cars about 20x per year per cars = 10 years x 10 cars x 20 = 2000 rentals.
If I store everything in one big table I've got 10x1000x2000 = 20 million records.
If I store it in 3 tables I've got 10+1000+2000 = 3010 records.
That's 3 orders of magnitude, so that's why I use 3 tables.
But because I use 3 tables (to save space and time) I have to use joins in order to get the data out again
(at least if I want names and licence plates instead of numbers).
Using inner joins
All rentals for customer 345?
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
WHERE customer.id = 345.
That's an INNER JOIN
, because we only want to know about cars linked to
rentals linked to
customers that actually happened.
Notice that we also have a bike_id, linking to the bike table, which is pretty similar to the car table but different.
How would we get all bike + car rentals for customer 345.
We can try and do this
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id)
INNER JOIN car on (car.id = rental.car_id)
INNER JOIN bike on (bike.id = rental.bike_id)
WHERE customer.id = 345.
But that will give an empty set!!
This is because a rental can either be a bike_rental OR a car_rental, but not both at the same time.
And the non-working inner join
query will only give results for all rentals where we rent out both a bike and a car in the same transaction.
We are trying to get and boolean OR
relationship using a boolean AND
join.
Using outer joins
In order to solve this we need an outer join
.
Let's solve it with left join
SELECT * FROM customer
INNER JOIN rental on (rental.customer_id = customer.id) <<-- link always
LEFT JOIN car on (car.id = rental.car_id) <<-- link half of the time
LEFT JOIN bike on (bike.id = rental.bike_id) <<-- link (other) 0.5 of the time.
WHERE customer.id = 345.
Look at it this way. An inner join
is an AND
and a left join
is a OR
as in the following pseudocode:
if a=1 AND a=2 then {this is always false, no result}
if a=1 OR a=2 then {this might be true or not}
If you create the tables and run the query you can see the result.
on terminology
A left join
is the same as a left outer join
.
A join
with no extra prefixes is an inner join
There's also a full outer join
. In 25 years of programming I've never used that.
Why Left join
Well there's two tables involved. In the example we linked
customer to rental with an inner join
, in an inner join both tables must link so there is no difference between the left:customer
table and the right:rental
table.
The next link was a left join
between left:rental
and right:car
. On the left side all rows must link and the right side they don't have to. This is why it's a left join