7

The scenario is users specify when they are available, these specified times can overlap each other. I'm trying to get the total time they are available for. Example with SQL Fiddle:

--Available--
ID  userID  availStart          availEnd
1   456     '2012-11-19 16:00'  '2012-11-19 17:00'
2   456     '2012-11-19 16:00'  '2012-11-19 16:50'
3   456     '2012-11-19 18:00'  '2012-11-19 18:30'
4   456     '2012-11-19 17:30'  '2012-11-19 18:10'
5   456     '2012-11-19 16:00'  '2012-11-19 17:10'
6   456     '2012-11-19 16:00'  '2012-11-19 16:50'

The output should be 130 minutes:

1: 60
2: 0 as falls inside 1
3: 30
4: 30 as the last 10 mins is covered by 3
5: 10 as first 60 mins is covered by 1
6: 0 as falls inside 1

I can get the total overlapping minutes, however this is more than the SUM of the available minutes:

SQL Fiddle

Any ideas how I can achieve this?

EDIT 21st Nov 12: Thanks so far for everyone's solutions - in a way I'm happy to see this wasn't an 'easy' query to write.

EDIT 23rd Nov 12: This is all great work. Internally here we're thinking it might be best to ensure users cannot enter overlapping times (eg forcing them to amend an existing entry)!

Marcus
  • 9,011
  • 10
  • 45
  • 65
  • 1
    I raised a separate question about whether the cursor method can be converted to a CTE http://stackoverflow.com/questions/13482921/find-total-minutes-ignoring-overlap-convert-cursor-based-answer-to-cte – Laurence Nov 20 '12 at 21:58
  • I did some performance analysis on all the algorithms so far https://docs.google.com/spreadsheet/ccc?key=0Ar3ZD6o7Zd8_dGZXdW1QWmhKTWhiZV9BOVZZUEtKQXc – Laurence Nov 22 '12 at 22:14
  • So you're looking for the summation of contiguous minutes - or the summation of minutes of a given set. It would probably be easier to find the min max minutes and subtract the number of minutes not allocated. – SQLMason Nov 22 '12 at 22:16
  • @DanAndrews It's more like the total number of minutes in the union of date ranges. Finding the holes would seem to be as hard as figuring out the overlaps. Code talks of course! – Laurence Nov 22 '12 at 23:07
  • @Laurence was that comparison done on your own server? SQLFiddle is reporting higher run times than those! – Marcus Nov 23 '12 at 17:15
  • Yep, that was done on my laptop - Core i7 X920 2GHz, and SSDs. – Laurence Nov 23 '12 at 19:57

5 Answers5

4

Gordon Linoff has a CTE based answer

I've done some performance analysis on all the working algorithms Blank values mean it took too long. This is tested on a single Core i7 X920 @2GHz chip, backed by a couple of SSDs. The only index created was a cluster on UserID, AvailStart. If you think you can improve any of the performance, let me know.

This CTE version was worse than linear, SQL Server can't do the RN = RN + 1 join in an efficient way. I rectified this with a hybrid approach below, where I save and index the first CTE into a table variable. This still takes ten times as much IO as the cursor based approach.

With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = 456
),
AccumulateMinutes (RN, Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    OrderedRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    OrderedRanges o On 
        a.RN = o.RN - 1
)

Select Max(Accum + datediff(Minute, CurStart, CurEnd)) From AccumulateMinutes 

http://sqlfiddle.com/#!6/ac021/2

After doing some performance analysis, here's a hybrid CTE/table variable version that performs better than anything except the cursor based approach

Create Function dbo.AvailMinutesHybrid(@UserID int) Returns Int As
Begin

Declare @UserRanges Table (
  RN int not null primary key, 
  AvailStart datetime, 
  AvailEnd datetime
)
Declare @Ret int = Null

;With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = @UserID
)
Insert Into @UserRanges Select * From OrderedRanges


;With AccumulateMinutes (RN,Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    @UserRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    @UserRanges o On 
        a.RN + 1 = o.RN
)

Select 
  @Ret = Max(Accum + datediff(Minute, CurStart, CurEnd)) 
From 
  AccumulateMinutes 
Option
  (MaxRecursion 0)

Return @Ret

End

http://sqlfiddle.com/#!6/bfd94

Community
  • 1
  • 1
Laurence
  • 10,896
  • 1
  • 25
  • 34
  • Thanks for pointing out a flaw in my answer. I'm not sure it can be amended without changing the approach, so I've just deleted it. Meanwhile, I think both your answers show that this problem is hard to solve using a purely set-based approach. I mean, a recursive CTE is an iteration essentially, just more nicely shaped as a single query. True, Gordon *has* come up with a set-based solution, but he had to reference the same (not very light) CTE twice, the thing I tried to avoid (and in the end, failed to account for all possible cases). Anyway, both methods *are* solutions, so I've voted both. – Andriy M Nov 22 '12 at 08:27
  • I give up)).Your approach the best. +1 for both answers. By the way I update answer.Thanks – Aleksandr Fedorenko Nov 22 '12 at 15:29
2

The main problem is that you can have chains of overlapping entries, so you need to combine an indefinite amount of times to remove all the overlap - this is more suited to a procedural method than SQL. But if you would prefer to not use temporary tables, here's a CTE method - keep in mind that CTEs can only recurse a given number of times, so if you have any particularly long chains, it will fail.

WITH MergedAvailable
AS
(
  SELECT Available.UserID, Available.AvailStart, MAX(Available.AvailEnd) AS AvailEnd
    FROM Available
   WHERE (
           SELECT COUNT(*)
             FROM Available AS InnerAvailable
            WHERE InnerAvailable.AvailStart < Available.AvailStart
                  AND
                  InnerAvailable.AvailEnd >= Available.AvailStart
         ) = 0
   GROUP BY Available.UserID, Available.AvailStart
  UNION ALL
  SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
         LongestExtensionToAvailableInterval.NewIntervalEnd
    FROM MergedAvailable
   CROSS APPLY GetLongestExtensionToAvailableInterval(MergedAvailable.UserID,
               MergedAvailable.AvailStart,
               MergedAvailable.AvailEnd) AS LongestExtensionToAvailableInterval
   WHERE LongestExtensionToAvailableInterval.NewIntervalEnd IS NOT NULL
)

SELECT SUM(DATEDIFF(MINUTE,
                    FinalAvailable.AvailStart,
                    FinalAvailable.AvailEnd)) AS MinsAvailable
  FROM (
         SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
                MAX(MergedAvailable.AvailEnd) AS AvailEnd
           FROM MergedAvailable
          GROUP BY MergedAvailable.UserID, MergedAvailable.AvailStart
       ) AS FinalAvailable

This table function is required:

CREATE FUNCTION GetLongestExtensionToAvailableInterval
(
  @UserID int,
  @CurrentIntervalStart datetime,
  @CurrentIntervalEnd datetime
)
RETURNS TABLE
AS
RETURN 
  SELECT MAX(Available.AvailEnd) AS NewIntervalEnd
    FROM Available
   WHERE Available.UserID = @UserID
         AND
         Available.AvailStart > @CurrentIntervalStart
         AND
         Available.AvailStart <= @CurrentIntervalEnd
         AND
         Available.AvailEnd > @CurrentIntervalEnd

The general idea is that it starts from all ranges where the start of the range isn't overlapping anything, and then with every recursion it extends the current range to the furthest extent of the currently overlapping ranges. The table function is needed to determine the furthest extent, as recursing sections of CTEs are not allowed to included plain aggregates.

With the data you've provided, the starting rows are:

456 2012-11-19 16:00 2012-11-19 17:10
456 2012-11-19 17:30 2012-11-19 18:10

The only row which ends up being added via the recursion is:

456 2012-11-19 17:30 2012-11-19 18:30

For the sake of the example, say you had a row with ID 7 which went from 18:20 to 19:20. Then there would be a second recursion which brought back the row:

456 2012-11-19 17:30 2012-11-19 19:20

So while the query will get to the start and end of each overlapping range, it will also be bringing back all the intermediate stages. This is why we need to take the aggregate maximum end date for each start date after the CTE, to remove them.

  • If you order the ranges by start time, then only need one pass. You coalesce until you find a disjoint range, then accumulate the minutes from the previous coalesced range. Not sure if that can help simplify the CTE. – Laurence Nov 20 '12 at 22:02
  • I'm not sure what you're referring to by one pass here, sorry? In theory, you could run the CTE over the rows in starting time order, but I don't think it would run very well (it certainly wouldn't be one pass). – Tristan Wilkinson Nov 20 '12 at 23:10
  • See my answer that uses a cursor, It maintains an accumulator and only goes through each row once. I've had no luck converting it to a CTE, though. I don't know if it makes it any easier. – Laurence Nov 20 '12 at 23:13
  • CTEs can't run across individual rows in the same way that a procedural method can, it would need to run a separate query for each one. (This would also mean that all participating rows would show up as results and you would need to disregard all of them but the last one, assuming that you were accumulating the value across the result set.) – Tristan Wilkinson Nov 20 '12 at 23:29
  • There's something about this query that can cause the plan to explode. See http://sqlfiddle.com/#!6/5f3cf/2 for example. It seems to be something to do with the density of intervals, If I spread them out over a year, I can happily get results with thousands of rows. Is there some other indexing scheme that will help other than (userid, availstart)? – Laurence Nov 22 '12 at 13:48
  • Because the CTE attachs every possible interval which can extend the current interval with every recursion, if you have a lot of overlapping rows all within a single interval as in the new example data, it will perform horribly. The raw data returned by the CTE has 723976 rows here. Unfortunately, you can't run aggregates within the recursing part of a CTE, so I'm not aware of any way to not have this behaviour currently. I don't think that adjusting the indexing would help here. – Tristan Wilkinson Nov 26 '12 at 00:24
  • It turns out that CTEs don't have any problem with including table functions which run aggregates in the recursing part. I'll add a version taking advantage of that shortly. – Tristan Wilkinson Nov 26 '12 at 00:40
  • I updated the performance spreadsheet with this. It's clearly the fastest version that avoids cursors or temporary tables. https://docs.google.com/spreadsheet/ccc?key=0Ar3ZD6o7Zd8_dGZXdW1QWmhKTWhiZV9BOVZZUEtKQXc – Laurence Nov 28 '12 at 17:07
2

Here's another way of doing it with a cursor. I feel this techinique should be adaptable to a CTE, but I can't figure out how to do it

The method is to arrange each range by start time Then we build a range that coalesces ranges in order, until we find a range that doesn't overlap our coalesced range. We then calculate how many minutes are in the coalesced range, and remember this We carry on with the next ranges, again coalesing any that overlap. We accumulate minutes each time we get a non overlapping start point At the end we add the accumulated minutes onto the length of the last range

It's fairly easy to see that because of the order, once a range is distinct from what's gone before then no further ranges could overlap what's gone before, as their start dates are all greater.

Declare
  @UserID int = 456,
  @CurStart datetime, -- our current coalesced range start
  @CurEnd datetime, -- our current coalsced range end
  @AvailStart datetime, -- start or range for our next row of data
  @AvailEnd datetime, -- end of range for our next row of data
  @AccumMinutes int = 0 -- how many minutes so far accumulated by distinct ranges

Declare MinCursor Cursor Fast_Forward For
Select
  AvailStart, AvailEnd
From
  dbo.Available
Where
  UserID = @UserID
Order By
  AvailStart

Open MinCursor

Fetch Next From MinCursor Into @AvailStart, @AvailEnd
Set @CurStart = @AvailStart
Set @CurEnd = @AvailEnd

While @@Fetch_Status = 0
Begin
  If @AvailStart <= @CurEnd -- Ranges Overlap, so coalesce and continue
    Begin
    If @AvailEnd > @CurEnd 
      Set @CurEnd = @AvailEnd
    End
  Else -- Distinct range, coalesce minutes from previous range
  Begin
    Set @AccumMinutes = @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd)
    Set @CurStart = @AvailStart -- Start coalescing a new range
    Set @CurEnd = @AvailEnd
  End
  Fetch Next From MinCursor Into @AvailStart, @AvailEnd
End

Select @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd) As TotalMinutes

Close MinCursor
Deallocate MinCursor;

http://sqlfiddle.com/#!6/3483c/15

Laurence
  • 10,896
  • 1
  • 25
  • 34
  • Thanks Laurence, I can just about get my head around this. This 'small' issue is part of a much bigger SQL query calculating available times for users - if I go down this route I'll either use the CTE or wrap this in a table function. – Marcus Nov 21 '12 at 11:59
  • 1
    You can wrap this in a UDF like so http://sqlfiddle.com/#!6/c54b1/1. However, I'm not sure it would be a good idea if you were calling it on large result sets. – Laurence Nov 21 '12 at 13:29
2

Condition t1.availStart > t2.availEnd OR t1.availEnd < t2.availStart check period which will never be crossed. If it is crossed then minimum availStart or maximum availEnd esle availStart or availEnd.

Probably more than one crossing period.
In your case it 
16:00:00 - 17:10:00 includes the ranges:16:00:00 - 16:50:00,
                                        16:00:00 - 16:50:00,
                                        16:00:00 - 17:00:00,
                                        16:00:00 - 17:10:00
17:30:00 - 18:30:00 includes the ranges:17:30:00 - 18:10:00,
                                        18:00:00 - 18:30:00

UPDATE 21.11.2012; 30.11.2012; 04.01.2013

CREATE FUNCTION dbo.Overlap
 (
  @availStart datetime,
  @availEnd datetime,
  @availStart2 datetime,
  @availEnd2 datetime
  )
RETURNS TABLE
RETURN
  SELECT CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availStart ELSE
                               CASE WHEN @availStart > @availStart2 THEN @availStart2 ELSE @availStart END
                               END AS availStart,
         CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availEnd ELSE
                             CASE WHEN @availEnd > @availEnd2 THEN @availEnd ELSE @availEnd2 END
                             END AS availEnd

;WITH cte AS
 (
  SELECT userID, availStart, availEnd, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY AvailStart) AS Id
  FROM dbo.test53
  ), cte2 AS
 (
  SELECT Id, availStart, availEnd
  FROM cte
  WHERE Id = 1
  UNION ALL
  SELECT c.Id, o.availStart, o.availEnd
  FROM cte c JOIN cte2 ct ON c.Id = ct.Id + 1
             CROSS APPLY dbo.Overlap(c.availStart, c.availEnd, ct.availStart, ct.availEnd) AS o
  )
  SELECT TOP 1 SUM(DATEDIFF(minute, availStart, MAX(availEnd))) OVER()
  FROM cte2
  GROUP BY availStart

Demo on SQLFiddle

Aleksandr Fedorenko
  • 16,594
  • 6
  • 37
  • 44
  • Thanks Alexander, this looks excellent. Can you please explain why the OVER clause is significant? – Marcus Nov 21 '12 at 12:02
  • That's neat, but it doesn't work if you have ranges like http://sqlfiddle.com/#!3/12aea/2 : e.g. 17:00 - 18:00, 17:00 - 17:10, 17:50 - 18:00. Considered as a whole they all share minutes, but the second and third range don't overlap. – Laurence Nov 21 '12 at 13:18
  • 1
    I'm amazed at the variety of queries and query plans that solve this problem. The plan for the new solution is fairly scary! – Laurence Nov 21 '12 at 16:49
  • Good job @Laurence. You are magnificent;) I update my answer. – Aleksandr Fedorenko Nov 21 '12 at 16:49
  • This still doesn't work, I was doing some timings on larger data sets, and it got a different answer to the other queries. Here's the simplest example I can find: http://sqlfiddle.com/#!3/176e3/1 It's weird, because removing any one row seems to make it work, and the times are only six consecutive hours... – Laurence Nov 21 '12 at 23:47
  • @AlexanderFedorenko Here's an example that breaks the latest version: http://sqlfiddle.com/#!3/57e46/1 (Not deliberately picking on you, just doing some performance tests and noticed the discrepancy!) – Laurence Nov 22 '12 at 16:55
  • @Laurence test my new script please;) – Aleksandr Fedorenko Nov 30 '12 at 16:35
1
Create Table #Available (
  ID int not null primary key,
  UserID int not null,
  AvailStart datetime not null,
  AvailEnd datetime not null
)


Insert Into #Available (ID,UserID, AvailStart, AvailEnd) Values
  (1,456, '2012-11-19 16:00', '2012-11-19 17:00'),
  (2,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (3,456, '2012-11-19 18:00', '2012-11-19 18:30'),
  (4,456, '2012-11-19 17:30', '2012-11-19 18:10'),
  (5,456, '2012-11-19 16:00', '2012-11-19 17:10'),
  (6,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (7,457, '2012-11-19 16:00', '2012-11-19 17:10'),
  (8,457, '2012-11-19 16:00', '2012-11-19 16:50');  
Select Distinct UserID 
into #users
from #Available


Create Table #mins(UserID int,atime datetime,aset tinyint )
Declare @start Datetime
Declare @end Datetime

Select @start=min(AvailStart),@end=max(AvailEnd) from #Available 
While @start<@end
    begin
     insert into #mins(UserID,atime) 
     Select UserID ,@Start from #users
     Select @start=DateAdd(mi,1,@start)
    end

update #mins set aset=1
from #Available
where atime>=AvailStart and atime<Availend and #mins.UserID = #Available.UserID


select UserID,SUM(aset) as [Minutes] 
from #mins
Group by UserID 
Drop table #Available
Drop table #mins
Drop table #users
bummi
  • 27,123
  • 14
  • 62
  • 101
  • Thanks - I see how you're doing that there with inserting individual minutes when they are available. I'm hoping theres another way to do this without having to use temp tables altogether. – Marcus Nov 20 '12 at 00:04
  • Hope so for you, so problem is that there can not only be overlappings but also spare times between (sorry 4 my bad english) – bummi Nov 20 '12 at 00:09
  • Thanks very much for this solution, it's the most simple for me however I'm looking to avoid temp tables! – Marcus Nov 21 '12 at 11:57
  • Alexander Fedorenko has the one you need :-) – bummi Nov 21 '12 at 12:19