Expanding Multiple Time Series in a Table

Question

I have a table containing multiple time series, each one is associated with a given ID.

Something like this

+--------------------------------------+
| ID | start      | end        | value |
+--------------------------------------+
| a  | 01/01/2018 | 03/01/2018 | 5     |
| a  | 03/01/2018 | 04/01/2018 | 6     |
| a  | 04/01/2018 | 06/01/2018 | 7     |
| b  | 01/01/2018 | 04/01/2018 | 3     |
| b  | 04/01/2018 | 06/01/2018 | 4     |
+--------------------------------------+

We see that the time series are define by irregular intervals. I would like to "expand" each time series so there is a row per day in the series.

Like this

+--------------------------------------+
| ID | start      | end        | value |
+--------------------------------------+
| a  | 01/01/2018 | 02/01/2018 | 5     |
| a  | 02/01/2018 | 03/01/2018 | 5     |
| a  | 03/01/2018 | 04/01/2018 | 6     |
| a  | 04/01/2018 | 05/01/2018 | 7     |
| a  | 05/01/2018 | 06/01/2018 | 7     |
| b  | 01/01/2018 | 02/01/2018 | 3     |
| b  | 02/01/2018 | 03/01/2018 | 3     |
| b  | 03/01/2018 | 04/01/2018 | 3     |
| b  | 04/01/2018 | 05/01/2018 | 4     |
| b  | 05/01/2018 | 06/01/2018 | 4     |
+--------------------------------------+

Is getting the latter table from the former possible in SQL, if so can you please point me in the right direction?

Note: Each time series is contiguous and there are no overlapping intervals.

I don't see how to populate that without some sort of looping and a temporary table of some sort. — Zorkind, Jan 22 '18 at 18:27
you need to do it in a query statement or it can be a procedure/function? — Zorkind, Jan 22 '18 at 18:27
Oh i found out a way using ROW_NUMBER and DATEDIFF. i am experimenting here, i will post an answer soon. — Zorkind, Jan 22 '18 at 18:40
Is there a 'generate_series' equivalent of postgres in sql-server? In postgres, you could do `select ids, generate_series(starts, ends, '1 day':: interval) as d,value` — jimmu, Jan 22 '18 at 18:41
Is there a maximum number of months covered by any individual row? — shawnt00, Jan 22 '18 at 19:42

score 2 · Answer 1 · answered Jan 22 '18 at 18:56

You can try this.

DECLARE @T TABLE (ID VARCHAR(2),  start DATE, [end] DATE, value INT )
INSERT INTO @T VALUES
( 'a', '01/01/2018', '01/03/2018', 5 ),
( 'a', '01/03/2018', '01/04/2018', 6 ),
( 'a', '01/04/2018', '01/06/2018', 7 ),
( 'b', '01/01/2018', '01/04/2018', 3 ),
( 'b', '01/04/2018', '01/06/2018', 4 )


;WITH CTE AS (
    SELECT * FROM @T 
    UNION ALL
    SELECT T.ID, DATEADD(DAY,1, CTE.start) Start, T.[end], T.value 
    FROM @T T 
        INNER JOIN CTE ON T.ID = CTE.ID AND T.value = CTE.value 
            AND DATEADD(DAY,1, CTE.start)  < T.[end]
)
SELECT ID , start, DATEADD(DAY,1, start) [end], value FROM CTE
ORDER BY ID, start

Result:

ID   start      end        value
---- ---------- ---------- -----------
a    2018-01-01 2018-01-02 5
a    2018-01-02 2018-01-03 5
a    2018-01-03 2018-01-04 6
a    2018-01-04 2018-01-05 7
a    2018-01-05 2018-01-06 7
b    2018-01-01 2018-01-02 3
b    2018-01-02 2018-01-03 3
b    2018-01-03 2018-01-04 3
b    2018-01-04 2018-01-05 4
b    2018-01-05 2018-01-06 4

Thank you for your time. This conceptually/syntactically seems like the nicest solution for me, however it is far too slow for my needs (the input table can have O(1e5) rows). — Griffin, Jan 23 '18 at 17:38
Thank you for your nice comment. You are right, for your question, recursive cte approach works slower than static numerical multiplier table. — Serkan Arslan, Jan 23 '18 at 18:10

JohnRC · Accepted Answer · 2018-01-24T17:17:22.927

This approach joins the original table with a table of integers from 0, selecting only those integers that are less than the difference between the dates. Each row selected provides one of the records for the expanded date range. I have assumed that no date gap in the original data is more than 9999 days, but if it is you can extend the table of integers by adding a line for tenthousands etc.

SELECT  T.ID, 
        DATEADD(D, V.N, T.Start) [start], 
        DATEADD(D, V.N+1, T.Start) [end], 
        T.Value
    FROM YourTable T
    JOIN (
                SELECT ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n N
                    FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
                         (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)

        ) V
    ON V.N < DATEDIFF(D, T.Start, T.[End])

EDIT: I should have acknowledged @slartidan for the elegant way used to generate an integer list in this answer https://stackoverflow.com/a/33146869/1992793

I went with this approach as it was fast enough for my needs. Thank you for your time. — Griffin, Jan 23 '18 at 17:38

score 0 · Answer 3 · answered Jan 22 '18 at 19:56

0

Cross apply is often a faster option. I'm supposing that you might be able to get away with assuming a limit on the number of months covering a range and also assuming that your input dates always fall on the first of the month.

with mnths as (
    select *
    from (values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11)) as t(offset)
)
select *
from data
    cross apply (
        select dateadd(month, offset, start_dt) as expanded_start_dt
        from mnths
        where dateadd(month, offset, start_dt) < end_dt
    ) as m

answered Jan 22 '18 at 19:56

shawnt00

16,443
3
17
22

While the assumption of limited interval length was ok, the ranges starting on the first of the month was not. Thanks for your time. – Griffin Jan 23 '18 at 17:40
Easy enough to tweak. The biggest difference here was using `cross apply`. – shawnt00 Jan 23 '18 at 19:31

Expanding Multiple Time Series in a Table

3 Answers3