0

I have a table like this in SQL Server:

id  start_time  end_time
1   10:00:00    10:34:00
2   10:38:00    10:52:00
3   10:53:00    11:23:00
4   11:24:00    11:56:00
5   14:20:00    14:40:00
6   14:41:00    14:59:00
7   15:30:00    15:40:00

What I would like to have is a query that outputs consolidated records based on the time difference between two consecutive records (end_time of row n and start_time row n+1) . All records where the time difference is less than 2 minutes should be combined into one time entry and the ID of the first record should be kept. This should also combine more than two records if multiple consecutive records have a time difference less than 2 minutes.

This would be the expected output:

id  start_time  end_time
1   10:00:00    10:34:00
2   10:38:00    11:56:00
5   14:20:00    14:59:00
7   15:30:00    15:40:00

Thanks in advance for any tips how to build the query.

Edit: I started with following code to calculate the lead_time and the time difference but do not know how to group and consolidate.

WITH rows AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY Id) AS rn
        FROM #temp
        )
SELECT  mc.id, mc.start_time, mc.end_time, mp.start_time lead_time, DATEDIFF(MINUTE, mc.[end_time], mp.[start_time]) as DiffToNewSession
FROM    rows mc
LEFT JOIN    rows mp
ON      mc.rn = mp.rn - 1
Raffael
  • 51
  • 4
  • You could take a look at [this answer](https://stackoverflow.com/a/5101424/4949005). It uses `Group by` and it could work. – smoksnes May 05 '21 at 05:59
  • 1
    *based on the time difference between two consecutive records* - You missed the definition of time difference, because there are two time columns. – astentx May 05 '21 at 06:07
  • @astentx, the OP means the difference between the "end_time" of one row and the "start_time" of the next row .... which is how he gets a row with ID=2, start_time = 10:30, end_time = 11:56 .... row with ID 3 starts 1 minute after row with ID 2 ends, row 4 starts 1 minute after row 3 ends, but then row 5 starts 24 minutes after row 4 ends, thus the reason why (in the "expected output") rows 5 gets its own row – Craig May 05 '21 at 06:13

3 Answers3

4

The window function in t-sql can realize a lot of data statistics, such as

create table #temp(id int identity(1,1), start_time time, end_time time)
insert into #temp(start_time, end_time)
values  ('10:00:00', '10:34:00')
      , ('10:38:00', '10:52:00')
      , ('10:53:00', '11:23:00')
      , ('11:24:00', '11:56:00')
      , ('14:20:00', '14:40:00')
      , ('14:41:00', '14:59:00')
      , ('15:30:00', '15:40:00')

;with c0 as(
select *, LAG(end_time,1,'00:00:00') over (order by id) as lag_time
from #temp
), c1 as(
select *, case when DATEDIFF(MI, lag_time, start_time) <= 2 then 1 else -0 end as gflag
from c0
), c2 as(
select *, SUM(case when gflag=0 then 1 else 0 end) over(order by id) as gid
from c1
)
select MIN(id) as id, MIN(start_time) as start_time, MAX(end_time) as end_time
from c2
group by gid

In order to better describe the process of data construction, I simply use c0, c1, c2... to represent levels, you can merge some levels and optimize. If you can’t use id as a sorting condition, then you need to change the sorting part in the above statement.

Brucelin Michael
  • 475
  • 4
  • 10
  • works perfectly. For me it is easier to understand the recursive solution, though. but thanks! – Raffael May 05 '21 at 08:15
  • You are welcome, personally, highly recommended to use the "collection" idea to solve SQL problems, and recursion, loop (cursor) these are better used in the program. – Brucelin Michael May 05 '21 at 08:49
  • @Raffael . . . As the OP, you can of course choose any answer you want. My vote goes to this version, both for clarity and performance. – Gordon Linoff May 05 '21 at 12:07
  • after some additional work and added sorting criteria (I have some other columns than only the ID) I changed from the recursive solution to this one as you can easily step by step follow the data construction. – Raffael May 05 '21 at 15:32
1

You can use a recursive cte to get the result that you want. This method just simple compare current end_time with next start_time. If it is less than the 2 mintues threshold use the same start_time as grp_start. And the end, simple do a GROUP BY on the grp_start

with rcte as
(
    -- anchor member
    select  *, grp_start = start_time
    from    tbl
    where   id  = 1

    union all

    -- recursive member
    select  t.id, t.start_time, t.end_time, 
            grp_start = case when datediff(second, r.end_time, t.start_time) <= 120
                             then r.grp_start
                             else t.start_time
                             end
    from    tbl t
            inner join rcte r   on  t.id    = r.id + 1
)
select  id = min(id), grp_start as start_time, max(end_time) as end_time
from    rcte 
group by grp_start

demo

Squirrel
  • 23,507
  • 4
  • 34
  • 32
0

I guess this should do the trick without recursion. Again I used several ctes in order to make the solution a bit easier to read. guess it can be reduced a little...

INSERT INTO T1 VALUES
(1,'10:00:00','10:34:00')
,(2,'10:38:00','10:52:00')
,(3,'10:53:00','11:23:00')
,(4,'11:24:00','11:56:00')
,(5,'14:20:00','14:40:00')
,(6,'14:41:00','14:59:00')
,(7,'15:30:00','15:40:00')
GO

WITH cte AS(
SELECT *
      ,ROW_NUMBER() OVER (ORDER BY id) AS rn
      ,DATEDIFF(MINUTE, ISNULL(LAG(endtime) OVER (ORDER BY id), starttime), starttime) AS diffMin
      ,COUNT(*) OVER (PARTITION BY (SELECT 1)) as maxRn
  FROM T1
),
cteFirst AS(
SELECT *
  FROM cte
  WHERE rn = 1 OR diffMin > 2
),
cteGrp AS(
SELECT *
      ,ISNULL(LEAD(rn) OVER (ORDER BY id), maxRn+1) AS nextRn
  FROM cteFirst
)
SELECT f.id, f.starttime, MAX(ISNULL(n.endtime, f.endtime)) AS endtime
  FROM cteGrp f
  LEFT JOIN cte n ON n.rn >= f.rn AND n.rn < f.nextRn
  GROUP BY f.id, f.starttime
Tyron78
  • 4,117
  • 2
  • 17
  • 32
  • [`LEAD`](https://learn.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql?view=sql-server-ver15) function accepts 3 arguments: value, offset and default value, which is used when the function goes beyond the window boundary. So `IFNULL(LEAD(expr1) ..., expr2)` can be rewritten with just `LEAD`, that is more readable: `LEAD(expr1, 1, expr2)` ...` – astentx May 05 '21 at 07:42
  • This works only if the last two rows should not be combined. If the last row has a time difference below the treshhold, it is not considered as the end_time for the second last row and also is not included in the result individually. – Raffael May 05 '21 at 08:05