Find Total Minutes Ignoring Overlap (Convert Cursor based Answer to CTE)

Question

There is an existing question that asked how to find how many minutes there are in multiple date ranges, ignoring overlaps.

The example data given is (userID isn't particularly relevant)

--Available--
ID  userID  availStart          availEnd
1   456     '2012-11-19 16:00'  '2012-11-19 17:00'
2   456     '2012-11-19 16:00'  '2012-11-19 16:50'
3   456     '2012-11-19 18:00'  '2012-11-19 18:30'
4   456     '2012-11-19 17:30'  '2012-11-19 18:10'
5   456     '2012-11-19 16:00'  '2012-11-19 17:10'
6   456     '2012-11-19 16:00'  '2012-11-19 16:50'

I can solve the problem using a cursor, but I think it should be adaptable to a CTE, however I can't figure out how to do it.

The method is to arrange each range by start time Then we build a range that coalesces ranges in order, until we find a range that doesn't overlap our coalesced range. We then calculate how many minutes are in the coalesced range, and remember this We carry on with the next ranges, again coalesing any that overlap. We accumulate minutes each time we get a non overlapping start point At the end we add the accumulated minutes onto the length of the last range

It's fairly easy to see that because of the order, once a range is distinct from what's gone before then no further ranges could overlap what's gone before, as their start dates are all greater.

Declare
  @UserID int = 456,
  @CurStart datetime, -- our current coalesced range start
  @CurEnd datetime, -- our current coalesced range end
  @AvailStart datetime, -- start or range for our next row of data
  @AvailEnd datetime, -- end of range for our next row of data
  @AccumMinutes int = 0 -- how many minutes so far accumulated by distinct ranges

Declare MinCursor Cursor Fast_Forward For
Select
  AvailStart, AvailEnd
From
  dbo.Available
Where
  UserID = @UserID
Order By
  AvailStart

Open MinCursor

Fetch Next From MinCursor Into @AvailStart, @AvailEnd
Set @CurStart = @AvailStart
Set @CurEnd = @AvailEnd

While @@Fetch_Status = 0
Begin
  If @AvailStart <= @CurEnd -- Ranges Overlap, so coalesce and continue
    Begin
    If @AvailEnd > @CurEnd 
      Set @CurEnd = @AvailEnd
    End
  Else -- Distinct range, coalesce minutes from previous range
  Begin
    Set @AccumMinutes = @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd)
    Set @CurStart = @AvailStart -- Start coalescing a new range
    Set @CurEnd = @AvailEnd
  End
  Fetch Next From MinCursor Into @AvailStart, @AvailEnd
End

Select @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd) As TotalMinutes

Close MinCursor
Deallocate MinCursor;

Got the CTE working, was just a silly error in the recursion. The query plan explosion is quite impressive:

With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = 456
),
AccumulateMinutes (RN, Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    OrderedRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    OrderedRanges o On 
        a.RN = o.RN - 1
)

Select Max(Accum + datediff(Minute, CurStart, CurEnd)) From AccumulateMinutes

Is this adaptable to a CTE, and is there a general pattern for accumulating over a list in thie way?

http://sqlfiddle.com/#!6/ac021/2

seems identic to http://stackoverflow.com/questions/13464281/sum-of-minutes-between-multiple-date-ranges/13464594#13464594 — bummi, Nov 20 '12 at 22:03
@bummi I linked that question in the first line. I'm interested in the specific way of doing it, and whether accumulating over a list can be written in a standard way in a CTE. — Laurence, Nov 20 '12 at 22:05

score 6 · Accepted Answer · edited Nov 20 '12 at 23:20

The following query finds the periods in the data, according to your definition. It uses correlated subqueries first to determine whether a record is the start of a period (that is, no overlap with earlier time periods). It then assigns the "periodStart" as the most recent start that is the beginning of a non-overlapping period.

The following (untested) query takes this approach:

with TimeWithOverlap as (
     select t.*,
            (case when exists (select * from dbo.Available tbefore where t.availStart > tbefore.availStart and tbefore.availEnd >= t.availStart)
                  then 0
                  else 1
             end) as IsPeriodStart
     from dbo.Available t 
    ),
    TimeWithPeriodStart as (
     select two.*,
            (select MAX(two1.AvailStart) from TimeWithOverlap two1 where IsPeriodStart = 1 and two1.AvailStart <= two.AvailStart
            ) as periodStart
     from TimeWithOverlap two
    )
select periodStart, MAX(AvailEnd) as periodEnd
from TimeWithPeriodStart twps
group by periodStart;

http://sqlfiddle.com/#!6/3483c/20 (Second Query)

If two periods both start at the same time, then it still works, because the AvailStart values are the same. Because of the correlated subqueries, this might not perform very well on even medium sized data sets.

There are other methods for approaching this. For instance, if you had SQL Server 2012, you would be able to use cumulative sum functions, which offer a simpler method.

Gordon, thanks for that, fixed a couple of typos and tested it. It certainly works (and output the actual periods too, which is neat). I'm a bit surprised that the cursor algorithm can't be translated more directly. It is essentially an 1-pass accumulator approach (where the next value of the accumulator only depends on the current value and the current row in the table). It's almost like CTEs are too powerful for such a mundane algorithm. — Laurence, Nov 20 '12 at 23:25
@Laurence . . . The cursor algorithm can be applied more directly when you have cumulative sums. It still may not be one-pass, but cursors are generally so slow (inherently single threaded, moving data from database to application) that database-based alternative typically perform better. — Gordon Linoff, Nov 21 '12 at 14:05
Gordon, I've finally got my original CTE working (updated in question) Would be interested in your thought on the performance. I think the query plan looks OK with an index on userid, availstart. — Laurence, Nov 21 '12 at 16:38
The recursive CTE will probably perform better. It is worth comparing the performance of the two to be sure. — Gordon Linoff, Nov 21 '12 at 16:49

score 0 · Answer 2 · answered Nov 20 '12 at 22:37

I solved that (well, in a way) very efficiently by creating a dumb table having the date and time (accurate be the minute) in one column (PK) and a bit in the second. A '1' meant, the user is available and 0 meant, he/she's not.

The rest is dead simple. I was sick of having to write endless complicated queries in trying to get the minutes in partly overlapping time ranges.

In fact, this was for computing machine efficiency.

I know this is not the real deal but the most simple solution I came up with. You might create a function/SP which creates that table..

Find Total Minutes Ignoring Overlap (Convert Cursor based Answer to CTE)

2 Answers2

Linked