I have a table consisting of about 70,000 rows and two columns (both VARCHAR(16)
): id
and parent_id
.
I'd like to populate a 'depth' column that shows how far a particular record is from the 'root' node.
e.g.
id,parent_id,depth
A,NULL,0
B,A,1
C,A,1
D,B,2
E,D,3
etc.
I started by writing a query based on this answer to a similar question:
WITH myCTE(id, depth) AS
(
SELECT id, 0 FROM objects where id = 'A'
UNION ALL
SELECT objects.id, depth + 1 FROM myCTE JOIN objects ON objects.parent_id = myCTE.id
)
SELECT id, depth FROM myCTE
With my dataset (~80,000 rows) the above takes almost two hours to execute!
I then wrote my query as a loop and got far better performance:
ALTER TABLE objects ADD depth INT NULL
DECLARE @counter int
DECLARE @total int
SET @counter = 0
UPDATE objects SET depth = 0 WHERE id = 'A'
SELECT @total = COUNT(*) FROM objects WHERE depth IS NULL
WHILE (@total > 0)
BEGIN
UPDATE objects SET depth = @counter + 1 WHERE parent_id IN (
SELECT id FROM objects WHERE depth = @counter
)
SELECT @total = COUNT(*) FROM objects WHERE depth IS NULL
SET @counter = @counter + 1
END
The above code only takes a couple of minutes (and it has the benefit of adding the results to the existing table)
My question is whether my results are typical of using a CTE for this problem or whether there is something I have overlooked that might explain it? Indexes, maybe? (I have none on the table right now)