SQL is great for many things, but hierarchical data is one of bigger challenges. Some vendors has provided custom extensions to work around this (e.g. Oracle's CONNECT
syntax or SQL Server's hierarchyid
data type), but we probably want to keep this standard SQL1.
What you have modeled is called "adjacency list" -- this is very simple and straightforward, and always consistent2. But as you found out, this sucks for querying, especially for an unknown depth or for a subtree, rather than from the root node.
Therefore, we need to supplement this with an additional model. There are basically 3 other models that you should use in conjunction with the adjacency list model.
- Nested sets
- Materialized Path
- Ancestry traversal closure
To study them in depth, we'll use this diagram:

For this discussion, we are also assuming this is a simple hierarchy, that there are no cycles.
Joe Celko's Nested Sets.
Basically, you store the "Left" and "Right" value of each node which indicates its position in the tree. The root node will always have 1
for "Left" and <count of nodes * 2>
for "Right". This is easier to illustrate with a diagram:

Note that each node gets assigned a pair of number, one for "Left", and other for "Right". With that information, you can do some logical deductions. Finding all children becomes easy - you filter for values where the nodes' "Left" is greater than the target node's "Left" and where the same nodes' "Right" is smaller than the target node's "Right".
The biggest downside with the model is that a change to the hierarchy almost invariably requires updating the entire tree, which makes it very awful to maintain for a fast moving charts. If this is something you only update once a year, this might be acceptable.
The other issue with this model is that if there is a need for a multiple hierarchies, the nested set will not work without additional columns to track the separate hierarchy.
Materialized Path
You know how a filesystem path works, right? This is basically the same thing, except that we are storing this in the database3. For instance, a possible implementation of a materialized path might look like this:
ID Name Path
1 Alice 1/
2 Bob 1/2/
3 Christina 1/3/
4 Dwayne 1/4/
5 Erin 1/2/5/
6 Frank 1/2/6/
7 Georgia 1/2/7/
8 Harry 1/2/7/8/
9 Isabella 1/3/9/
10 Jake 1/3/10/
11 Kirby 1/3/10/11/
12 Lana 1/3/12/
13 Mike 1/4/13/
14 Norma 1/4/13/14/
15 Opus 1/4/15/
16 Rianna 1/4/16/
This is quite intuitive and can perform OK as long you write your SQL queries to use predicates like WHERE Path LIKE '1/4/*'
. Engines will be able to use index on the path column. Note that if your queries involve querying a middle of the tree or from bottom up, that means index cannot be used and performance will suffer for it. But programming against a materialized path is pretty easy to understand. Updating a part of the tree won't propagate to unrelated nodes as the nested sets so that's also a plus in its favor.
The biggest downside is that to be indexable, the text has to be a short column. For Access database that puts a 255 character limit on your path field. Even worse, there is no good way to predict when you are about to hit the limit -- you could hit it because you have too deep tree, or because you have too wide tree (e.g. bigger numbers taking up too much spaces). For that reason, large trees might necessitate some hard-coded limit to avoid this situation.
Ancestry Traversal Closure
This model involves a separate table which is updated whenever the employee table is updated. Instead of only recording the immediate relationship, we enumerate all the ancestry between two nodes. To illustrate, this is how the table will look like:
Employee table:
ID Name
1 Alice
2 Bob
3 Christina
4 Dwayne
5 Erin
6 Frank
7 Georgia
8 Harry
9 Isabella
10 Jake
11 Kirby
12 Lana
13 Mike
14 Norma
15 Opus
16 Rianna
Employee Ancestry Table:
Origin Ancestor
1 1
2 1
2 2
3 1
3 3
4 1
4 4
5 1
5 2
5 5
6 1
6 2
6 6
7 1
7 2
7 7
8 1
8 2
8 7
8 8
9 1
9 3
9 9
10 1
10 3
10 10
11 1
11 3
11 10
11 11
12 1
12 3
12 12
13 1
13 4
14 1
14 4
14 13
14 14
15 1
15 4
15 15
16 1
16 4
16 16
As you see, we generate several rows worth of all possible relationship between two nodes. As a bonus because it's a table, we can make use of foreign key and cascade delete to help keep it consistent. We still have to manually manage the inserts & updates however. Because the table is also narrow, it makes it very easy to create query that can leverage index on the key, the origin and the ancestor to find the subtree, the children, the parent. This is the most flexible system at expense of extra complexity around the maintenance.
Maintaining the model
All 3 models discussed are basically denormalizing the data a bit in order to simplify the query and support an arbitrary depth search. A consequence of that is this necessitates us to manually manage the changes when the employee table is modified in some fashion.
The most simplest approach is simply to just write a VBA procedure that will truncate and re-build the entire chart using your preferred model. This can work very well when the chart is small or does not change often.
On the other end, you could consider using Data Macros on your employee table to perform the maintenance required to propagate the updates to the hierarchy. A caveat, though, if you use data macros, this makes it harder to port the data to another RDBMS system since none of those support data macros. (To be fair, the problem would still exist if you were porting from SQL Server's stored procedures/triggers to Oracle's stored procedure/triggers - those are very steeped in vendor's dialect that porting is a challenge). Using data macros or trigger + stored procedure mean that you can rely on the engine to maintain the hierarchy for you without any programming in the forms.
A common temptation is to use form's AfterUpdate
event to maintain the changes and that would work.... unless someone update it outside the form. For that reason, I would actually prefer that we used a data macro rather than relying on everyone to always use the form.
Note that in all of this discussion, we should NOT discard the adjacency list model. As I commented earlier, this is the most normalized and consistent way to model the hierarchy. It is literally impossible to create a nonsensical hierarchy with it. For that reason alone, you should keep it as your "authoritative truth", which you can then build your model upon to aid the querying performance.
Another good reason to keep using the adjacency list model is regardless of which model you use above, they introduce either additional columns or additional tables that are not meant to be directly edited by users but are for purpose somewhat equivalent to a calculated field and thus should not be tinkered with. If the users are allowed to edit only the SupervisorID
field, then it becomes easy to code your data macros/triggers/VBA procedure around that one field, and updating the "calculations" of the additional fields/table to ensure correctness for the queries depending on such models.
1. SQL Standard does describe a way to create a recursive query. However, the compliance for that particular feature seems to be poor. Furthermore, the performance may not be that great. (which is the case with SQL Server's particular implementation) The 3 models discussed are easily implemented in most of RDBMS and queries for querying the hierarchy can be easily written and ported. However, the implementation to automatically manage the changes to the hierarchy invariably requires vendor-specific dialect, using triggers or stored procedure which is not very portable.
2. When I say consistent, I only mean that the model cannot create a nonsensical output. It's still possible to provide wrong data and make a weird hierarchy such as an employee's supervisor reporting to the employee, but not one that would give undefined results. However, it still is a hierarchy (even if it ends up as a cyclical graph). With other models, failing to maintain the derived data correctly means the queries will start returning undefined results.
3. SQL Server's hierarchyid
data type is in fact an implementation of this model.