0

I started from this post: Detect consecutive dates and continued researching CTE and OVER and PARTITION BY. I have some data in a log that I have been running through programmatically and I am trying to optimize the process. Picture the sample data this way...

EventCode, TechCode, TaskCode, Time
2021003, 42, D, 2021-08-07 13:30
2021003, 42, E, 2021-08-07 13:30
2021003, 42, C, 2021-08-07 13:40
2021004, 43, D, 2021-08-07 13:35
2021004, 43, A, 2021-08-07 13:37
2021004, 43, X, 2021-08-07 13:38
2021004, 43, D, 2021-08-07 13:50
2021004, 43, E, 2021-08-07 13:50
2021004, 43, C, 2021-08-07 14:00

D,A, and E are start codes. D may not always be the first code in the group. It could also be A or E. C and X are stop codes. From this I would generate 3 rows:

Event,Tech,Start,Stop
2021003, 42, 2021-08-07 13:30, 2021-08-07 13:40
2021004, 43, 2021-08-07 13:35, 2021-08-07 13:38
2021004, 43, 2021-08-07 13:50, 2021-08-07 14:00

to represent the start and stop times of the tech on the event. If a tech worked on an event only once then it is easy enough to partition by eventcode, techcode.

;WITH CTE as (
Select EventCode, TechCode, MIN(Time) OVER (PARTITION BY EventCode, TechCode) as Start, 
MAX(Time) OVER (PARTITION BY EventCode, TechCode) as Stop, 
ROW_NUMBER() over (PARTITION BY EventCode, TechCode ORDER BY EventCode, TechCode, Time) as rownum
)
Select EventCode, TechCode, Start, Stop
from CTE
where rownum = 1

Can anyone see anyway to get one more level so that I can account for the 2 groups for tech 43 on event 2021004? Or is this one that I will just have to keep using my program to loop through the lines looking for stop and start codes to build each row?

Dale K
  • 25,246
  • 15
  • 42
  • 71
wb-wpb
  • 3
  • 1

1 Answers1

0

This is a type of gaps-and-islands problem.

There are a number of solutions. Here is one:

  • We identify the starting points of each group by looking at TaskCode for the current and previous row
  • We use a windowed COUNT to number the groups
  • We then simply group by EventCode TechCode and the new group number
WITH StartPoints AS (
    SELECT *,
        IsStart = CASE WHEN TaskCode IN ('D','A','E')
             AND LAG(TaskCode, 1, 'C') OVER (PARTITION BY EventCode, TechCode
                 ORDER BY Time) IN ('C', 'X') THEN 1 END
    FROM my_table t
),
Groupings AS (
    SELECT *,
        GroupId = COUNT(IsStart) OVER (PARTITION BY EventCode, TechCode
                  ORDER BY Time ROWS UNBOUNDED PRECEDING)
    FROM StartPoints t
)
SELECT
    EventCode,
    TechCode,
    MIN(Time) Start,
    MAX(Time) Stop
FROM Groupings t
GROUP BY
    EventCode, TechCode, GroupId;

db<>fiddle

Charlieface
  • 52,284
  • 6
  • 19
  • 43