2

Requirements

  1. I have data table that saves data in date ranges.
  2. Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
  3. New record can define it's own date range if it needs to hence can overlap several older records.
  4. Each new overlapping record overrides settings of older records that it overlaps.

Result set

What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.

To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.

Stored data

Imagine my stored data looks like this

ID | RangeFrom | RangeTo  | Starts | Ends  | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1  | 20110101  | 20110331 | 07:00  | 15:00
2  | 20110401  | 20110531 | 08:00  | 16:00
3  | 20110301  | 20110430 | 06:00  | 14:00 <- overrides both partially

Results

If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):

DayDate | Starts | Ends
--------|--------|------
20110101| 07:00  | 15:00  <- defined by record ID = 1
20110102| 07:00  | 15:00  <- defined by record ID = 1
...                          many rows omitted for obvious reasons
20110301| 06:00  | 14:00  <- defined by record ID = 3
20110302| 06:00  | 14:00  <- defined by record ID = 3
...                          many rows omitted for obvious reasons
20110501| 08:00  | 16:00  <- defined by record ID = 2
20110502| 08:00  | 16:00  <- defined by record ID = 2
...                          many rows omitted for obvious reasons
20110531| 08:00  | 16:00  <- defined by record ID = 2
Robert Koritnik
  • 103,639
  • 52
  • 277
  • 404
  • I notice in your sample output you have one row per day. If there are two rows that overlap for the same day, what is the rule by which only one should be displayed or should both be displayed? – Thomas Mar 19 '11 at 20:36
  • @Thomas: See **AD 4** in my question. Every new record overrides older records that it overlaps with. – Robert Koritnik Mar 19 '11 at 20:44

2 Answers2

6

Actually, since you are working with dates, a Calendar table would be more helpful.

Declare @StartDate date
Declare @EndDate date

;With Calendar As
    (
    Select @StartDate As [Date]
    Union All
    Select DateAdd(d,1,[Date])
    From Calendar
    Where [Date] < @EndDate
    )
Select ...
From Calendar
    Left Join MyTable
        On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );

Addition

Missed the part about the trumping rule in your original post:

Set DateFormat MDY;
Declare @StartDate date = '20110101';
Declare @EndDate date = '20110501';

-- This first CTE is obviously to represent
-- the source table
With SampleData As 
    (
    Select 1 As Id
        , Cast('20110101' As date) As RangeFrom
        , Cast('20110331' As date) As RangeTo
        , Cast('07:00' As time) As Starts
        , Cast('15:00' As time) As Ends
        , CURRENT_TIMESTAMP As CreatedOn
    Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
    Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
    )
    , Calendar As
    (
    Select @StartDate As [Date]
    Union All
    Select DateAdd(d,1,[Date])
    From Calendar
    Where [Date] < @EndDate
    )
    , RankedData As
    (
    Select C.[Date]
        , S.Id
        , S.RangeFrom, S.RangeTo, S.Starts, S.Ends
        , Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
    From Calendar As C
        Join SampleData As S
            On C.[Date] Between S.RangeFrom And S.RangeTo
    )
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1   
Option ( Maxrecursion 0 );

In short, I rank all the sample data preferring the newer rows that overlap the same date.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Thomas
  • 63,911
  • 12
  • 95
  • 141
  • A user defined function with two parameters returning dates table even better! :) But this is just one part of the problem. This will convert my date ranges to day records. But how do I account for overlapping records? – Robert Koritnik Mar 19 '11 at 15:47
  • @Robert Koritnik - First, some schema information, sample inputs and expected outputs would be helpful. Second, is it that you are trying to locate, by day, scenarios where multiple rows overlap the same day? – Thomas Mar 19 '11 at 19:00
  • @Thomas: Updated my question or better said added table structure and result table structure... – Robert Koritnik Mar 19 '11 at 19:19
  • @Thomas: Close but not exactly the same. You get multiple records for each day that has overrides (as many as there are overrides + 1). That's not desired. As per **AD4** newer records take precedence when there are overrides. – Robert Koritnik Mar 19 '11 at 20:50
  • @Robert - Yeah, I overlooked the fourth requirement. Have revised. – Thomas Mar 19 '11 at 20:53
  • @Robert - Btw, it should be noted that you might want another tie-breaker in case it ever happens you have two rows that overlap with identical CreatedOn values. That could happen from an import. If the Id column is an identity column, you could use that as a second tie-breaker. – Thomas Mar 19 '11 at 20:55
  • @Thomas: +1 for putting all the effort into this, but the solution I eventually used is the one provided in my own answer. – Robert Koritnik Apr 25 '11 at 22:19
  • pretty bad one for the big dates range, Maxrecursion: 0 prevents the error for 100+ days but we still end up having a huge recursion, which is not healthy :) – mikus Aug 07 '15 at 15:08
  • @mikus - How is that not solved by using a reasonable `@EndDate`? If a ginormous range is a possibility, then instead of using a CTE created calendar table, create a static calendar table with every possible date that is relevant and use that instead. – Thomas Aug 10 '15 at 04:15
  • reasonable EndDate? You mean not asking for a period of more than 3 months :D, that's a bit of a limitation. You can make a nested CTE and divide the calculation between days and months and get to ~100 months, that's a bit better, but not to cool or pretty – mikus Aug 11 '15 at 07:44
  • @mikus - I've used that code to populate centuries of days without issue. Again, if you use a calendar table, it is a *one-time population* for the life of the database. Even going from say the year 1900 to 2500 is only 220K rows calculated once. That eliminates the CTE that creates the calendar table and instead uses a physical table which will be incredibly fast. – Thomas Aug 12 '15 at 01:00
1

Why do it all in DB when you can do it better in memory

This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.

  1. get actual range definitions from DB to mid tier (smaller amount of data)
  2. generate in memory calendar of a certain date range (faster than in DB)
  3. put those DB definitions in (much easier and faster than DB)

And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.

Robert Koritnik
  • 103,639
  • 52
  • 277
  • 404