I am trying to evaluate the impact of store visitors on the spread of COVID-19.
Here is a simple scenario:
- VisitorA walks into store and meets Employee1 @ Time = 0.
- VisitorA then meets Employee2 @ Time = 1.
- VisitorB walks into store and meets Employee1 @ Time = 1.
- VisitorB then meets Employee3 @ Time = 2.
- VisitorA leaves store.
When I collect all visitor data and who they met over time the data set looks something like this:
Table visitorByEmployee
:
| VisitorID | EmployeeID | Contact |
+-----------+------------+-------------------+
| 100 | X123 | 3/11/2020 1:00 |
| 100 | X124 | 3/11/2020 1:10 |
| 101 | X123 | 3/12/2020 1:11 |
| 101 | X125 | 3/11/2020 1:20 |
| 102 | X126 | 3/12/2020 10:00 |
| 102 | X124 | 3/12/2020 10:00 |
| 103 | X123 | 3/12/2020 11:00 |
| 104 | X124 | 3/12/2020 12:00 |
| 104 | X126 | 3/12/2020 12:00 |
| 105 | X126 | 3/12/2020 12:00 |
I want to build a hierarchy off of this data that can eventually be expressed as follows:
Each Tree represents the impact of the Visitors on the spread of the virus:
100
--> X123
--> 101
--> X125
--> 103
--> X124
--> 104
102
--> X126
--> 104
--> 105
--> X124
--> 104
--> X126
I attempted to do this by first finding the root nodes (root visitors who were no impacted by previous visitors and/or employees they saw). These were 100 and 102.
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY Contact) AS SeenOrder
INTO
#SeenOrder
FROM
visitorByEmployee
SELECT *
INTO #RootVisitors
FROM #SeenOrder
WHERE SeenOrder = 1
From #RootVisitors
and #SeenOrder
, I want to build a table that can tell me that hierarchy of impact and maybe result in something like this:
| InitVisitorID | HLevel | EmployeeID | VisitorID |
+---------------+------------+-------------------+-------------+
| 100 | 0 | X123 | 100 |
| 100 | 0 | X124 | 100 |
| 100 | 1 | X123 | 101 |
| 100 | 1 | X123 | 103 |
| 100 | 1 | X124 | 104 |
| 100 | 2 | X125 | 101 |
| 102 | 0 | X126 | 102 |
| 102 | 0 | X124 | 102 |
| 102 | 1 | X126 | 104 |
| 102 | 1 | X126 | 105 |
| 102 | 1 | X124 | 104 |
| 102 | 2 | X126 | 104 |
Is this something that can be done using a recursive CTEs? I attempted to do this but due to the shifting hierarchy from visitor to employee to visitor to employee, I am having a hard time creating that recursive CTE.
UPDATE Here is the recursive CTE I am working on. It doesn't work yet but the approach is what I am sharing:
; WITH exposure_tree AS (
/* == Anchor with the root visitors == */
/* == You can think of this: The Employees who were exposed by the Visior == */
SELECT re.VisitorID InitVisitor,
1 as Level,
CASE WHEN 1%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.VisitorID Exposer,
re.EmployeeID Exposee,
re.SeenOrder,
re.InitialContact
FROM #SeenOrder re
WHERE re.SeenOrder = 1
/* == Recursive Part #1 ==
Get the visitors who were exposed next by the exposed employees
*/
UNION ALL
SELECT et.VisitorID InitVisitor,
Level + 1,
CASE WHEN (Level+1)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.EmployeeID,
re.VisitorID, -- These are switched from the anchor.
re.SeenOrder,
re.InitialContact
FROM #SeenOrder re
JOIN exposure_tree et ON et.Exposee = re.EmployeeID AND re.SeenOrder > 1 AND re.InitialContact > et.InitialContact
UNION ALL
/* == Recursive Part #2 ==
Get the next employees who were exposed the second level exposed visitors
*/
SELECT et.VisitorID InitVisitor,
Level + 2,
CASE WHEN (Level+2)%2=1 THEN 'Visitor' ELSE 'Employee' END ExposerType,
re.VisitorID,
re.EmployeeID,
re.SeenOrder,
re.InitialContact
FROM #ROOT_EXPOSURES re
JOIN exposure_tree et ON re.VisitorID = et.Exposer and re.SeenOrder > 1 AND re.InitialContact > et.InitialContact
)
select top 1000 * from exposure_tree ORDER BY InitVisitor, Level