I have a large hierarchy (2,500+ records) stored in Microsoft SQL Server (2019) using an adjacency list model (e.g., Id
, ParentId
). I am looking for an efficient approach to look up a record based on a particular path in the hierarchy. In other words, given a path (e.g. /Root/FolderA/SubfolderA
), I'd like to retrieve the Id
associated with the final node (i.e., SubfolderA
in this case).
Note: Node names are not globally unique. I.e., we can't just look for
SubfolderA
and assume it maps to/Root/FolderA/SubfolderA
. There may be multiple nodes namedSubfolderA
within the hierarchy.
Setup
Hierarchy
/Root
/FolderA
/SubfolderA
/SubfolderB
/FolderB
/SubfolderA
/SubfolderB
Structure
CREATE
TABLE [dbo].[Tree] (
[Id] INT NOT NULL PRIMARY KEY,
[ParentId] INT NULL,
[Name] VARCHAR(255) NOT NULL,
CONSTRAINT [FK_Hierarchy]
FOREIGN KEY (ParentId)
REFERENCES [Tree]([Id])
)
Data
INSERT INTO Tree VALUES (1, NULL, 'Root');
INSERT INTO Tree VALUES (2, 1, 'FolderA');
INSERT INTO Tree VALUES (3, 2, 'SubfolderA');
INSERT INTO Tree VALUES (4, 2, 'SubfolderB');
INSERT INTO Tree VALUES (5, 1, 'FolderB');
INSERT INTO Tree VALUES (6, 5, 'SubfolderA');
INSERT INTO Tree VALUES (7, 5, 'SubfolderB');
Naïve Approach
There are quite a few threads on how to convert an adjacency list to materialized paths, including:
- SQL - Convert non-null adjacency list to path
- Build Enumeration Path from Adjacency List in SQL
- Flatten Adjacency List Hierarchy To A List Of All Paths
- Load hierarchical data from MSSQL with recursive common table expressions
View
We can use one of these approaches to convert the entire adjacency list into materialized paths using an rCTE:
CREATE
VIEW [dbo].[MaterializedPaths]
WITH SCHEMABINDING
AS
WITH RCTE AS (
SELECT Id,
ParentId,
CAST('/' + Name AS VARCHAR(255)) AS Path
FROM [dbo].[Tree] root
WHERE root.Id = 1
UNION ALL
SELECT this.Id,
this.ParentId,
CAST(parent.Path + '/' + this.Name AS VARCHAR(255)) AS Path
FROM [dbo].[Tree] AS this
INNER JOIN RCTE parent
ON this.ParentId = parent.Id
)
SELECT Id,
Path
FROM RCTE as hierarchy
Output
This produces the following output:
Id Path
1 /Root
2 /Root/FolderA
3 /Root/FolderA/SubfolderA
4 /Root/FolderA/SubfolderB
5 /Root/FolderB
6 /Root/FolderB/SubfolderA
7 /Root/FolderB/SubfolderB
Query
We can filter that output using a simple WHERE
clause:
SELECT Id
FROM MaterializedPaths
WHERE Path = '/Root/FolderA/SubfolderA'
Problem
The naïve approach works fine. The problem is that it's incredibly inefficient—and, thus, slow—for querying large hierarchies, as it needs to dynamically reconstruct the entire set of materialized paths each call. In my case, that takes 8-9 seconds. Obviously, I could just store this data in a table and regenerate it via a trigger at any time the data changes. But I'd rather find a more efficient query and avoid the additional complexity.
Question
What is an efficient way of constructing this query? Or, at the risk of making this an XY problem, is there a way to limit the rCTE so that it only needs to evaluate the nodes in the hierarchy, instead of reconstructing the entire hierarchy each time?