MS-Access SQL Self Join with multiple levels

Question

I have a table "People" with a primary key "PersonID" and a field that is "Supervisor". The "Supervisor" field contains the foreign key of a "PersonID" to create a self join.

I would like to create an sql query that returns all people with "Me" (the PersonID that is logged into the database) as their supervisor, and anyone that has someone on that list labeled as their supervisor. Essentially I would like to list anyone below the supplied PersonID in the chain of command.

I think that ms-access does not support recursive queries. If you have a limited number of hierarchy levels, you could imagine implement this with a series of joins... — GMB, Feb 05 '19 at 22:21
You can manage it using vba, using a loop or a recursive function, that queries the table for your `PersonId`, looks for the `Supervisor`, execute the same query for the `PersonId` of the `Supervisor`..., creating an array with the results gotten, exiting the loop when the query returns nothing. — James, Feb 05 '19 at 22:48

this · Answer 1 · 2019-02-06T15:02:55.337

SQL is great for many things, but hierarchical data is one of bigger challenges. Some vendors has provided custom extensions to work around this (e.g. Oracle's CONNECT syntax or SQL Server's hierarchyid data type), but we probably want to keep this standard SQL¹.

What you have modeled is called "adjacency list" -- this is very simple and straightforward, and always consistent². But as you found out, this sucks for querying, especially for an unknown depth or for a subtree, rather than from the root node.

Therefore, we need to supplement this with an additional model. There are basically 3 other models that you should use in conjunction with the adjacency list model.

Nested sets
Materialized Path
Ancestry traversal closure

To study them in depth, we'll use this diagram:

For this discussion, we are also assuming this is a simple hierarchy, that there are no cycles.

Joe Celko's Nested Sets.

Basically, you store the "Left" and "Right" value of each node which indicates its position in the tree. The root node will always have 1 for "Left" and <count of nodes * 2> for "Right". This is easier to illustrate with a diagram:

Note that each node gets assigned a pair of number, one for "Left", and other for "Right". With that information, you can do some logical deductions. Finding all children becomes easy - you filter for values where the nodes' "Left" is greater than the target node's "Left" and where the same nodes' "Right" is smaller than the target node's "Right".

The biggest downside with the model is that a change to the hierarchy almost invariably requires updating the entire tree, which makes it very awful to maintain for a fast moving charts. If this is something you only update once a year, this might be acceptable.

The other issue with this model is that if there is a need for a multiple hierarchies, the nested set will not work without additional columns to track the separate hierarchy.

Materialized Path

You know how a filesystem path works, right? This is basically the same thing, except that we are storing this in the database³. For instance, a possible implementation of a materialized path might look like this:

ID  Name      Path
1   Alice     1/
2   Bob       1/2/
3   Christina 1/3/
4   Dwayne    1/4/
5   Erin      1/2/5/
6   Frank     1/2/6/
7   Georgia   1/2/7/
8   Harry     1/2/7/8/
9   Isabella  1/3/9/
10  Jake      1/3/10/
11  Kirby     1/3/10/11/
12  Lana      1/3/12/
13  Mike      1/4/13/
14  Norma     1/4/13/14/
15  Opus      1/4/15/
16  Rianna    1/4/16/

This is quite intuitive and can perform OK as long you write your SQL queries to use predicates like WHERE Path LIKE '1/4/*'. Engines will be able to use index on the path column. Note that if your queries involve querying a middle of the tree or from bottom up, that means index cannot be used and performance will suffer for it. But programming against a materialized path is pretty easy to understand. Updating a part of the tree won't propagate to unrelated nodes as the nested sets so that's also a plus in its favor.

The biggest downside is that to be indexable, the text has to be a short column. For Access database that puts a 255 character limit on your path field. Even worse, there is no good way to predict when you are about to hit the limit -- you could hit it because you have too deep tree, or because you have too wide tree (e.g. bigger numbers taking up too much spaces). For that reason, large trees might necessitate some hard-coded limit to avoid this situation.

Ancestry Traversal Closure

This model involves a separate table which is updated whenever the employee table is updated. Instead of only recording the immediate relationship, we enumerate all the ancestry between two nodes. To illustrate, this is how the table will look like:

Employee table:

ID  Name
1   Alice
2   Bob
3   Christina
4   Dwayne
5   Erin
6   Frank
7   Georgia
8   Harry
9   Isabella
10  Jake
11  Kirby
12  Lana
13  Mike
14  Norma
15  Opus
16  Rianna

Employee Ancestry Table:

Origin  Ancestor
1        1
2        1
2        2
3        1
3        3
4        1
4        4
5        1
5        2
5        5
6        1
6        2
6        6
7        1
7        2
7        7
8        1
8        2
8        7
8        8
9        1
9        3
9        9
10       1
10       3
10       10
11       1
11       3
11       10
11       11
12       1
12       3
12       12
13       1
13       4
14       1
14       4
14       13
14       14
15       1
15       4
15       15
16       1
16       4
16       16

As you see, we generate several rows worth of all possible relationship between two nodes. As a bonus because it's a table, we can make use of foreign key and cascade delete to help keep it consistent. We still have to manually manage the inserts & updates however. Because the table is also narrow, it makes it very easy to create query that can leverage index on the key, the origin and the ancestor to find the subtree, the children, the parent. This is the most flexible system at expense of extra complexity around the maintenance.

Maintaining the model

All 3 models discussed are basically denormalizing the data a bit in order to simplify the query and support an arbitrary depth search. A consequence of that is this necessitates us to manually manage the changes when the employee table is modified in some fashion.

The most simplest approach is simply to just write a VBA procedure that will truncate and re-build the entire chart using your preferred model. This can work very well when the chart is small or does not change often.

On the other end, you could consider using Data Macros on your employee table to perform the maintenance required to propagate the updates to the hierarchy. A caveat, though, if you use data macros, this makes it harder to port the data to another RDBMS system since none of those support data macros. (To be fair, the problem would still exist if you were porting from SQL Server's stored procedures/triggers to Oracle's stored procedure/triggers - those are very steeped in vendor's dialect that porting is a challenge). Using data macros or trigger + stored procedure mean that you can rely on the engine to maintain the hierarchy for you without any programming in the forms.

A common temptation is to use form's AfterUpdate event to maintain the changes and that would work.... unless someone update it outside the form. For that reason, I would actually prefer that we used a data macro rather than relying on everyone to always use the form.

Note that in all of this discussion, we should NOT discard the adjacency list model. As I commented earlier, this is the most normalized and consistent way to model the hierarchy. It is literally impossible to create a nonsensical hierarchy with it. For that reason alone, you should keep it as your "authoritative truth", which you can then build your model upon to aid the querying performance.

Another good reason to keep using the adjacency list model is regardless of which model you use above, they introduce either additional columns or additional tables that are not meant to be directly edited by users but are for purpose somewhat equivalent to a calculated field and thus should not be tinkered with. If the users are allowed to edit only the SupervisorID field, then it becomes easy to code your data macros/triggers/VBA procedure around that one field, and updating the "calculations" of the additional fields/table to ensure correctness for the queries depending on such models.

^{1. SQL Standard does describe a way to create a recursive query. However, the compliance for that particular feature seems to be poor. Furthermore, the performance may not be that great. (which is the case with SQL Server's particular implementation) The 3 models discussed are easily implemented in most of RDBMS and queries for querying the hierarchy can be easily written and ported. However, the implementation to automatically manage the changes to the hierarchy invariably requires vendor-specific dialect, using triggers or stored procedure which is not very portable.}

^{2. When I say consistent, I only mean that the model cannot create a nonsensical output. It's still possible to provide wrong data and make a weird hierarchy such as an employee's supervisor reporting to the employee, but not one that would give undefined results. However, it still is a hierarchy (even if it ends up as a cyclical graph). With other models, failing to maintain the derived data correctly means the queries will start returning undefined results.}

^{3. SQL Server's hierarchyid data type is in fact an implementation of this model.}

"but we probably want to keep this standard SQL" The SQL standard has WITH RECURSIVE for such problems. — Erwin Smout, Feb 06 '19 at 10:05
Great! Unfortunately, Access SQL does not implement this. I'm not sure whether other vendors implements that, either. For example, [SQL Server allows recursive CTEs but not using the standard syntax](https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017). [Thus, the compliance for this feature seems questionable](https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL). Furthermore, even if this was SQL Server, [there can be performance issues](http://www.sqlservercentral.com/articles/T-SQL/2926/). — this, Feb 06 '19 at 11:18
Yes. Oracle does not even ***want*** to introduce the feature because they already had a solution (CONNECT BY) before the feature was introduced into the standard. I only intended to point out that "probably want to keep this standard" is a bit of a poor argument. — Erwin Smout, Feb 06 '19 at 11:31
Also want to note that "It is literally impossible to create a nonsensical hierarchy with it [adjacency list]" is a very far stretch. The structure per se does not prohibit "hierarchies" that are in fact cyclical graphs (and therefore not hierarchies at all) and preventing that is nowhere near trivial. — Erwin Smout, Feb 06 '19 at 11:33
Right - the "standard" part is simply that those 3 models can be setup to work on any other RDBMS, not just Access. The hard part is the implementation of the automatic management since that implies using triggers + stored procedures, which invariably becomes vendor-specific. I look at the model as basically creating an index to boost your query performance, albeit in much more manual manner. — this, Feb 06 '19 at 11:36
RE: the consistency comment - no, I was not referring to the cyclical graph which is allowed as indicated in the footnote. My definition is much more narrower - for any given inputs, you will be able to build a hierarchy with adjacency list. You can't have that with nested sets model (for example) because the left/right values must be a specific value. Failing to provide the correct values will mean the queries will start returning nonsensical results. Updated the footnote which hopefully makes that clearer. — this, Feb 06 '19 at 11:40
Cyclical graphs, if you don't prevent them, make computation of materialized paths nonterminating [unless it's explicitly countered in the computation itself of course]. As I'm sure you know. — Erwin Smout, Feb 06 '19 at 11:50

score 1 · Answer 2 · answered Feb 06 '19 at 08:24

As you probably will have a rather limited count, say six, of levels deep, you can use a query with subqueries with subqueries ... etc. Very simple.

For an unlimited number of levels, the fastest way I've found, is to create a lookup function which walks the tree for each record. This can output either the level of the record or a compound key build by the key of the record and all keys above.

As the lookup function will use the same recordset for every call, you can make it static, and (for JET) you can improve further by using Seek to locate the records.

Here's an example which will give you an idea:

Public Function RecursiveLookup(ByVal lngID As Long) As String

  Static dbs      As Database
  Static tbl      As TableDef
  Static rst      As Recordset

  Dim lngLevel    As Long
  Dim strAccount  As String

  If dbs Is Nothing Then
    ' For testing only.
    ' Replace with OpenDatabase of backend database file.
    Set dbs = CurrentDb()
    Set tbl = dbs.TableDefs("tblAccount")
    Set rst = dbs.OpenRecordset(tbl.Name, dbOpenTable)
  End If

  With rst
    .Index = "PrimaryKey"
    While lngID > 0
      .Seek "=", lngID
      If Not .NoMatch Then
        lngLevel = lngLevel + 1
        lngID = !MasterAccountFK.Value
        If lngID > 0 Then
          strAccount = str(!AccountID) & strAccount
        End If
      Else
        lngID = 0
      End If
    Wend
    ' Leave recordset open.
    ' .Close
  End With

'  Don't terminate static objects.
'  Set rst = Nothing
'  Set tbl = Nothing
'  Set dbs = Nothing

'  Alternative expression for returning the level.
'  (Adjust vartype of return value of function.) '  RecursiveLookup = lngLevel ' As Long
  RecursiveLookup = strAccount

End Function

This assumes a table with a primary key ID and a foreign (master) key pointing to the parent record - and a top level record (not used) with a visible key (AccountID) of 0.

Now your tree will be nicely shown almost instantaneously using a query like this where Account will be the visible compound key:

  SELECT
    *, RecursiveLookup([ID]) AS Account
  FROM
    tblAccount
  WHERE
    (AccountID > 0)
  ORDER BY
    RecursiveLookup([ID]);

If you wish to use this to add records to another table, you should not make an SQL call for each, as this is very slow, but first open a recordset, then use AddNew-Update to append each record and, finally, close this recordset.

My apologies. I did not mention that this was to set a recordsource for a form. — Josh Johner, Feb 07 '19 at 09:51

score 0 · Accepted Answer · answered Feb 06 '19 at 00:22

Consider the following set of functions:

Function BuildQuerySQL(lngsid As Long) As String
    Dim intlvl As Integer
    Dim strsel As String: strsel = selsql(intlvl)
    Dim strfrm As String: strfrm = "people as p0 "
    Dim strwhr As String: strwhr = "where p0.supervisor = " & lngsid

    While HasRecordsP(strsel & strfrm & strwhr)
        intlvl = intlvl + 1
        BuildQuerySQL = BuildQuerySQL & " union " & strsel & strfrm & strwhr
        strsel = selsql(intlvl)
        If intlvl > 1 Then
            strfrm = "(" & strfrm & ")" & frmsql(intlvl)
        Else
            strfrm = strfrm & frmsql(intlvl)
        End If
    Wend
    BuildQuerySQL = Mid(BuildQuerySQL, 8)
End Function

Function HasRecordsP(strSQL As String) As Boolean
    Dim dbs As DAO.Database
    Set dbs = CurrentDb
    With dbs.OpenRecordset(strSQL)
        HasRecordsP = Not .EOF
        .Close
    End With
    Set dbs = Nothing
End Function

Function selsql(intlvl As Integer) As String
    selsql = "select p" & intlvl & ".personid from "
End Function

Function frmsql(intlvl As Integer) As String
    frmsql = " inner join people as p" & intlvl & " on p" & intlvl - 1 & ".personid = p" & intlvl & ".supervisor "
End Function

Here, the BuildQuerySQL function may be supplied with the PersonID corresponding to a Supervisor and the function will return 'recursive' SQL code for an appropriate query to obtain the PersonID for all subordinates of the supervisor.

Such function may therefore be evaluated to construct a saved query, e.g. for a supervisor with PersonID = 5, creating a query called Subordinates:

Sub test()
    CurrentDb.CreateQueryDef "Subordinates", BuildQuerySQL(5)
End Sub

Or the SQL may be evaluated to open a RecordSet of the results perhaps, depending on the requirements of your application.

Note that the function constructs a UNION query, with each level of nesting unioned with the previous query.

I think I'm understanding this correctly. But visualizing this sql statement (if I'm understanding it correctly) is almost impossible as it creates a new union for each subordinate. So if I had 10 managers, each with 3 supervisors then my statement would have 30 unions? What is the performance implications of this method. Have you ever had it cause issues? — Josh Johner, Feb 07 '19 at 10:09
@JoshJohner It doesn't create a `union` for each subordinate, but rather a `union` for each **level** of the hierarchy - as such, the complexity and performance of the query is dependent on the number of levels in your management structure. — Lee Mac, Feb 07 '19 at 19:48

score 0 · Answer 4 · answered Feb 07 '19 at 00:28

After considering the options presented here I have decided that I am going about this the wrong way. I have added a field to the "People" table "PermissionsLevel" which is a lookup from another table with a simple "PermissionNumber" and "PermissionDescription". I then use a select case in the Form_load() event for the logged in user's permission level.

Select Case userPermissionLevel

    Case Creator
        'Queries everyone in the database

    Case Administrator    
        'Queries everyone in the "Department" they are a member of

    Case Supervisor
        'Queries all people WHERE supervisor = userID OR _
            supervisor IN (Select PersonID From People WHERE supervisor = userID)

    Case Custodian '(Person in charge of maintaining the HAZMAT Cabinet and SDS)
        'Queries WHERE supervisor = DLookup("Supervisor", "People", "PersonID = " & userID)

After putting this to use, I realized that this does not get me the third level down from the supervisor (user). I am going to use some of the techniques presented by Lee Mac to union queries to get me to where i need to be. — Josh Johner, Feb 07 '19 at 10:11

MS-Access SQL Self Join with multiple levels

4 Answers4

Linked