Purpose of Self-Joins

Question

I am learning to program with SQL and have just been introduced to self-joins. I understand how these work, but I don't understand what their purpose is besides a very specific usage, joining an employee table to itself to neatly display employees and their respective managers.

This usage can be demonstrated with the following table:

EmployeeID | Name          | ManagerID
    ------ | ------------- | ------
    1      | Sam           | 10
    2      | Harry         | 4
    4      | Manager       | NULL
   10      | AnotherManager| NULL

And the following query:

select worker.employeeID, worker.name, worker.managerID, manager.name
from employee worker join employee manager
on (worker.managerID = manager.employeeID);

Which would return:

Sam   AnotherManager
Harry Manager

Besides this, are there any other circumstances where a self-join would be useful? I can't figure out a scenario where a self-join would need to be performed.

Related: http://stackoverflow.com/questions/2458519/explanation-of-self-joins?noredirect=1&lq=1, http://stackoverflow.com/questions/3362038/what-is-self-join-and-when-would-you-use-it. — Nivas, Apr 19 '17 at 17:14

score 3 · Answer 1 · answered Apr 19 '17 at 17:19

Your example is a good one. Self-joins are useful whenever a table contains a foreign key into itself. An employee has a manager, and the manager is... another employee. So a self-join makes sense there.

Many hierarchies and relationship trees are a good fit for this. For example, you might have a parent organization divided into regions, groups, teams, and offices. Each of those could be stored as an "organization", with a parent id as a column.

Or maybe your business has a referral program, and you want to record which customer referred someone. They are both 'customers', in the same table, but one has a FK link to another one.

Hierarchies that are not a good fit for this would be ones where an entity might have more than one "parent" link. For example, suppose you had facebook-style data recording every user and friendship links to other users. That could be made to fit in this model, but then you'd need a new "user" row for every friend that a user had, and every row would be a duplicate except for the "relationshipUserID" column or whatever you called it.

In many-to-many relationships, you would probably have a separate "relationship" table, with a "from" and "to" column, and perhaps a column indicating the relationship type.

score 1 · Answer 2 · answered Apr 19 '17 at 17:20

I found self joins most useful in situations like this:

Get all employees that work for the same manager as Sam. (This does not have to be hierarchical, this can also be: Get all employees that work at the same location as Sam)

select e2.employeeID, e2.name
from employee e1 join employee e2
on (e1.managerID = e2.managerID)
where e1.name = 'Sam'

Also useful to find duplicates in a table, but this can be very inefficient.

score 1 · Answer 3 · answered Apr 20 '17 at 07:57

There are several great examples of using self-joins here. The one I often use relates to "timetables". I work with timetables in education, but they are relevant in other cases too.

I use self-joins to work out whether two items clash with one another, e.g. a student is scheduled for two lessons which happen at the same time, or a room is double booked. For example:

CREATE TABLE StudentEvents(
    StudentId int,
    EventId int,
    EventDate date,
    StartTime time,
    EndTime time
)

SELECT
    se1.StudentId,
    se1.EventDate,
    se1.EventId Event1Id,
    se1.StartTime as Event1Start,
    se1.EndTime as Event1End,
    se2.StartTime as Event2Start,
    se2.EndTime as Event2End,
FROM
    StudentEvents se1
    JOIN StudentEvents se2 ON
        se1.StudentId = se2.StudentId
        AND se1.EventDate = se2.EventDate
        AND se1.EventId > se2.EventId 
        --The above line prevents (a) an event being seen as clashing with itself
        --and (b) the same pair of events being returned twice, once as (A,B) and once as (B,A)
WHERE
    se1.StartTime < se2.EndTime AND
    se1.EndTime > se2.StartTime

Similar logic can be used to find other things in "timetable data", such as a pair of trains it is possible to take from A via B to C.

score 0 · Answer 4 · answered Apr 19 '17 at 17:32

0

Self joins are useful whenever you want to compare records of the same table against each other. Examples are: Find duplicate addresses, find customers where the delivery address is not the same as the invoice address, compare a total in a daily report (saved as record) with the total of the previous day etc.

answered Apr 19 '17 at 17:32

Olivier Jacot-Descombes

104,806
13
138
188

1

The examples aren't particularly good; `GROUP BY` + `HAVING COUNT(*) ....` are more efficient ways to find duplicates (and mismatches). – Apr 19 '17 at 22:57

Purpose of Self-Joins

4 Answers4