1

I have a table containing multiple time series, each one is associated with a given ID.

Something like this

+--------------------------------------+
| ID | start      | end        | value |
+--------------------------------------+
| a  | 01/01/2018 | 03/01/2018 | 5     |
| a  | 03/01/2018 | 04/01/2018 | 6     |
| a  | 04/01/2018 | 06/01/2018 | 7     |
| b  | 01/01/2018 | 04/01/2018 | 3     |
| b  | 04/01/2018 | 06/01/2018 | 4     |
+--------------------------------------+

We see that the time series are define by irregular intervals. I would like to "expand" each time series so there is a row per day in the series.

Like this

+--------------------------------------+
| ID | start      | end        | value |
+--------------------------------------+
| a  | 01/01/2018 | 02/01/2018 | 5     |
| a  | 02/01/2018 | 03/01/2018 | 5     |
| a  | 03/01/2018 | 04/01/2018 | 6     |
| a  | 04/01/2018 | 05/01/2018 | 7     |
| a  | 05/01/2018 | 06/01/2018 | 7     |
| b  | 01/01/2018 | 02/01/2018 | 3     |
| b  | 02/01/2018 | 03/01/2018 | 3     |
| b  | 03/01/2018 | 04/01/2018 | 3     |
| b  | 04/01/2018 | 05/01/2018 | 4     |
| b  | 05/01/2018 | 06/01/2018 | 4     |
+--------------------------------------+

Is getting the latter table from the former possible in SQL, if so can you please point me in the right direction?

Note: Each time series is contiguous and there are no overlapping intervals.

Griffin
  • 13,184
  • 4
  • 29
  • 43

3 Answers3

2

You can try this.

DECLARE @T TABLE (ID VARCHAR(2),  start DATE, [end] DATE, value INT )
INSERT INTO @T VALUES
( 'a', '01/01/2018', '01/03/2018', 5 ),
( 'a', '01/03/2018', '01/04/2018', 6 ),
( 'a', '01/04/2018', '01/06/2018', 7 ),
( 'b', '01/01/2018', '01/04/2018', 3 ),
( 'b', '01/04/2018', '01/06/2018', 4 )


;WITH CTE AS (
    SELECT * FROM @T 
    UNION ALL
    SELECT T.ID, DATEADD(DAY,1, CTE.start) Start, T.[end], T.value 
    FROM @T T 
        INNER JOIN CTE ON T.ID = CTE.ID AND T.value = CTE.value 
            AND DATEADD(DAY,1, CTE.start)  < T.[end]
)
SELECT ID , start, DATEADD(DAY,1, start) [end], value FROM CTE
ORDER BY ID, start

Result:

ID   start      end        value
---- ---------- ---------- -----------
a    2018-01-01 2018-01-02 5
a    2018-01-02 2018-01-03 5
a    2018-01-03 2018-01-04 6
a    2018-01-04 2018-01-05 7
a    2018-01-05 2018-01-06 7
b    2018-01-01 2018-01-02 3
b    2018-01-02 2018-01-03 3
b    2018-01-03 2018-01-04 3
b    2018-01-04 2018-01-05 4
b    2018-01-05 2018-01-06 4
Serkan Arslan
  • 13,158
  • 4
  • 29
  • 44
  • Thank you for your time. This conceptually/syntactically seems like the nicest solution for me, however it is far too slow for my needs (the input table can have O(1e5) rows). – Griffin Jan 23 '18 at 17:38
  • Thank you for your nice comment. You are right, for your question, recursive cte approach works slower than static numerical multiplier table. – Serkan Arslan Jan 23 '18 at 18:10
1

This approach joins the original table with a table of integers from 0, selecting only those integers that are less than the difference between the dates. Each row selected provides one of the records for the expanded date range. I have assumed that no date gap in the original data is more than 9999 days, but if it is you can extend the table of integers by adding a line for tenthousands etc.

SELECT  T.ID, 
        DATEADD(D, V.N, T.Start) [start], 
        DATEADD(D, V.N+1, T.Start) [end], 
        T.Value
    FROM YourTable T
    JOIN (
                SELECT ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n N
                    FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)

        ) V
    ON V.N < DATEDIFF(D, T.Start, T.[End])

EDIT: I should have acknowledged @slartidan for the elegant way used to generate an integer list in this answer https://stackoverflow.com/a/33146869/1992793

JohnRC
  • 1,251
  • 1
  • 11
  • 12
0

Cross apply is often a faster option. I'm supposing that you might be able to get away with assuming a limit on the number of months covering a range and also assuming that your input dates always fall on the first of the month.

with mnths as (
    select *
    from (values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11)) as t(offset)
)
select *
from data
    cross apply (
        select dateadd(month, offset, start_dt) as expanded_start_dt
        from mnths
        where dateadd(month, offset, start_dt) < end_dt
    ) as m
shawnt00
  • 16,443
  • 3
  • 17
  • 22
  • While the assumption of limited interval length was ok, the ranges starting on the first of the month was not. Thanks for your time. – Griffin Jan 23 '18 at 17:40
  • Easy enough to tweak. The biggest difference here was using `cross apply`. – shawnt00 Jan 23 '18 at 19:31