I have data in a table with the following structure where all id
's are unique and there are many different start and end combinations, with the difference between start and end ranging from less than a minute up to the hundreds of minutes. I only need to split out the duration for start/end differences greater than 60 minutes long.
| ID | DURATION_START | DURATION_END |
|--------|-------------------------|-------------------------|
| 0abc23 | 2019-06-29 00:08:00.000 | 2019-06-29 09:18:00.000 |
and I want to create a table that will split out the duration by hour with the id of the original entry attached, like below:
| ID | DURATION_START | DURATION_END |
|--------|-------------------------|-------------------------|
| 0abc23 | 2019-06-29 00:08:00.000 | 2019-06-29 01:07:59.000 |
| 0abc23 | 2019-06-29 01:08:00.000 | 2019-06-29 02:07:59.000 |
| 0abc23 | 2019-06-29 02:08:00.000 | 2019-06-29 03:07:59.000 |
| 0abc23 | 2019-06-29 03:08:00.000 | 2019-06-29 04:07:59.000 |
| 0abc23 | 2019-06-29 04:08:00.000 | 2019-06-29 05:18:00.000 |
I have tried the following code from a similar question for a different database engine:
with table1 as (
select id
, duration_start as fromdate
, duration_end as todate
from test_data
),
tab1 as (
select id
, fromdate
, dateadd(second, 60*60 - 1, fromdate) as todate1 // add an hour minus a second
, todate
from table1
union all
select id
, dateadd(minutes, 60, todate1) as fromdate // add an hour to todate1
, dateadd(second, 2*60*60-1, todate1) as todate1 // add 1 hours to todate1
, todate
from tab1 where date_trunc(hour, todate1) < date_trunc(hour, todate)
),
tab2 as (
select id
, fromdate
, case when todate1 > todate then todate else todate1 end as todate
from tab1
but that gives me
| ID | DURATION_START | DURATION_END |
|--------|-------------------------|-------------------------|
| 0abc23 | 2019-06-29 00:08:00.000 | 2019-06-29 01:07:59.000 |
| 0abc23 | 2019-06-29 02:07:59.000 | 2019-06-29 03:07:58.000 |
| 0abc23 | 2019-06-29 04:07:58.000 | 2019-06-29 05:07:57.000 |
| 0abc23 | 2019-06-29 06:07:57.000 | 2019-06-29 07:07:56.000 |
| 0abc23 | 2019-06-29 08:07:56.000 | 2019-06-29 09:07:55.000 |
I've tried adjusting the amount of time added but haven't been able to get a 1 hour offset between the start and end. I've also attempted to use connect by
and level
(suggested for similar question but on oracle) but haven't been able to get that to compile. I just get a invalid identifier error for LEVEL
.
select dateadd(hour,24,duration_start)
, greatest(duration_start, date_trunc(hour,dateadd(hour,((level-1)::int/24)::int,duration_start)))
, least(duration_start, date_trunc(hour,dateadd(hour,((level)::int/24)::int,duration_start)))
from test_data
connect by prior LEVEL = floor(datediff(hour, duration_start, duration_end)::int*24)+1;
Even just trying to use connect by
gives me an error:
select *
from test_data
connect by prior LEVEL = floor(datediff(hour, duration_start, duration_end)::int*24)+1;
error: SQL compilation error: error line 0 at position -1 invalid identifier 'HOUR'
If anyone can tell me how to adjust my approach for snowflake, that would be much appreciated, thanks!