2

I just found a great page with Venn diagrams of different joins and the code for executing them:
http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

I used the "Right Excluding Join" in my query, the Venn diagram looks like this:

right excluding join diagram

and here is the code:

SELECT subjects.subject
FROM sold_subjects
RIGHT JOIN subjects
ON sold_subjects.subject = subjects.subject
WHERE sold_subjects.subject IS NULL 

I am asking for an explanation of what this code actually does, particularly what happens in the last row. I understand that we are joining the two relations where they have the same subject, but what happens when we set subjects for one of the relations to NULL in the last row?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Sahand
  • 7,980
  • 23
  • 69
  • 137
  • This is one of the most frequently linked pages for SQL questions on SO. [Check out these search results.](http://stackoverflow.com/search?q=url%3A%22http%3A%2F%2Fwww.codeproject.com%2FArticles%2F33052%2FVisual-Representation-of-SQL-Joins%22) – Erwin Brandstetter Sep 12 '15 at 22:20

3 Answers3

3

First, what do JOIN and RIGHT JOIN do?

The JOIN gets information from two tables and joins them according to rules you specify in the ON or WHERE clauses.

The JOIN modifiers, such as LEFT, INNER, OUTER and RIGHT control the behavior you JOIN will have in case of unmatched records -- when no record in A matches a record in B according to the specified rules, and vice-versa.

To understand this part, take table A as being the left table and table B as being the right one. When you have multiple joins, the right table in each join is the one whose name is immediately right of the JOIN command.

e.g. FROM a1 LEFT JOIN ... LEFT JOIN b

The b table is the right one and whatever comes before is the left one.

This is a summary of the modifiers' behavior:

  • LEFT: preserves unmatched records in the left table, discards those in the right table;
  • RIGHT: preserves unmatched records in the right table, discards those in the left table;
  • INNER: preserves only the records that are matched, discards unmatched from both tables;
  • OUTER or FULL: preserves all records, regardless of matches.

What is visually happening?

Imagine you have two simple tables with the same names of the ones you put in there.

sold_subjects               subjects
subject                     subject
   1                           1
   2                           4
   3                           5
   4                           6

When you RIGHT JOIN two tables, you create a third one that looks like this:

joined_table
sold_subjects.subject    subjects.subject
   1                          1
   4                          4
  NULL                        5
  NULL                        6

Please note that the subjects 2 and 3 are already gone in this subset.

When you add a WHERE clause with sold_subjects.subject IS NULL, you are only keeping the last two lines where there was no match in subjects.

Paulo Avelar
  • 2,140
  • 1
  • 17
  • 31
2

The right join makes sure that you will keep all the records of the right table. If there is no match with the left table, then all the variables in the result originating from the left table will be null (because there is no match).

The where clause checks whether the value of lefttable.subject is null or not. If it's not null, then obviously the join succeeded. If it is null, then the join did not work, leaving this value blank. So this where clause will, per definition, return all the records of the right table that have no match in the left table, which is exactly what the venn diagram says!

This is a very common practice in SQL, there are may use cases. For example: left table is sales, right table is customers, and you want to know all the customers without sales.

Tamara
  • 65
  • 1
  • 11
2

RIGHT JOIN is shorthand for RIGHT OUTER JOIN.
Consider the excellent explanation in the fine manual:

LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards.

Conversely, RIGHT OUTER JOIN returns all the joined rows, plus one row for each unmatched right-hand row (extended with nulls on the left). This is just a notational convenience, since you could convert it to a LEFT OUTER JOIN by switching the left and right tables.

Bold emphasis mine. Your query is just one way to exclude rows that are not present in another table, with a shiny buzz word attached ("Right Excluding JOIN"). There are others:

Now, for the tricky part - or where you deviate from the original:

But what happens when we set subjects for one of the relations to NULL in the last row?

Your query has:

WHERE sold_subjects.subject IS NULL 

Where the original says:

WHERE A.Key IS NULL

Key is supposed to imply NOT NULL. The query simply does not work if either of the underlying table columns sold_subjects.subject or subjects.subject can be NULL. There would be no way to disambiguate how the row qualified:

  • subjects.subject IS NULL and no row with NULL in sold_subjects.subject
  • subjects.subject IS NULL and some row with NULL in sold_subjects.subject
  • subjects.subject IS NOT NULL but no matching row in sold_subjects

If one of the linking columns can be NULL, and you want to treat NULL values like they were actual values (which they are not), i.e. match NULL to NULL, you could substitute with an anti-join using the NULL-safe operator IS NOT DISTINCT FROM:

SELECT s.subject
FROM   subjects s
LEFT   JOIN sold_subjects ss ON ss.subject IS NOT DISTINCT FROM s.subject
WHERE  ss.subject IS NULL;

Also with shorter syntax, using the more commonly used LEFT JOIN, but otherwise identical. IS NOT DISTINCT FROM is often slower than a simple =, only use it where you need it. Typically, you join tables on key columns that are defined NOT NULL - implicitly (a PK column is NOT NULL automatically) or explicitly.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228