I'm experimenting with SQL Common Table Expressions using a sample hierarchy of cities, countries and continents and which have been visited and which haven't.
The table t_hierarchy
looks like this:
(NOTE: The visited
column is deliberately NULL for non-cities because I want this to be a dynamically calculated percentage.)
I have then used the following SQL to create a recursive result set based on the data in t_hierarchy
:
WITH myCTE (ID, name, type, parentID, visited, Depth)
AS
(
Select ID, name, type, parentID, visited, 0 as Depth From t_hierarchy where parentID IS NULL
UNION ALL
Select t_hierarchy.ID, t_hierarchy.name, t_hierarchy.type, t_hierarchy.parentID, t_hierarchy.visited, Depth + 1
From t_hierarchy
inner join myCte on t_hierarchy.parentID = myCte.ID
)
Select ID, name, type, parentID, Depth, cnt.numDirectChildren, visited
FROM myCTE
LEFT JOIN (
SELECT theID = parentID, numDirectChildren = COUNT(*)
FROM myCTE
GROUP BY parentID
) cnt ON cnt.theID = myCTE.ID
order by ID
The result looks like this:
What I would like to do now, which I am struggling with, is to create a column, e.g. visitedPercentage
to show the percentage of cities visited for each 'level' of the hierarchy (treating cities differently to countries and continents). To explain, working our way up the 'tree':
- Madrid would be 100% because it has been visited (
visited
= 1) - Barcelona would be 0% because it has not been visited (
visited
= 0) - Spain would therefore be 50% because it has 2 direct children and one is 100% and the other is 0%
- Europe would therefore be 50% because Spain is 50%, France is 100% (Paris has been visited), and Germany is 0% (Berlin has not been visited)
I hope this makes sense. I kind of want to say "if it's not a city, work out the visitedPercentage
of THIS level based on the visitedPercentage
of all direct children, otherwise just show 100% or 0%. Any guidance is much appreciated.
UPDATE: I've managed to progress it a bit further using Daniel Gimenez's suggestion to the point where I've got France 100, Spain 50 etc. But the top level items (e.g. Europe) are still 0, like this:
I think this is because the calculation is being done after the recursive part of the query, rather than within it. I.e. this line:
SELECT... , visitPercent = SUM(CAST visited AS int) / COUNT(*) FROM myCTE GROUP BY parentID
is saying "look at the visited
column for child objects, calculate the SUM of the values, and show the result as visitPercent
", when it should be saying "look at the existing visitPercent
value from the previous calculation", if that makes sense. I've no idea where to go from here! :)