0

I have a table with date ranges like

----------------------------------------------------------------
|     id    | date_start               | date_end              |
----------------------------------------------------------------
|     1     | 2017-02-03 08:00:00.000 | 2017-02-03 17:00:00.000|
|     2     | 2017-02-04 15:00:00.000 | 2017-02-05 10:00:00.000|
|     3     | 2017-02-06 14:00:00.000 | 2017-02-07 23:00:00.000|
----------------------------------------------------------------

As you can see, some of the ranges can cover more than 1 day period (like #2, #3), and I need to separate such records by days to have a result like:

----------------------------------------------------------------
|     id    | date_start               | date_end              |
----------------------------------------------------------------
|     1     | 2017-02-03 08:00:00.000 | 2017-02-03 17:00:00.000|
|     2     | 2017-02-04 15:00:00.000 | 2017-02-04 23:59:59.999|
|     2     | 2017-02-05 00:00:00.000 | 2017-02-05 10:00:00.000|
|     3     | 2017-02-06 14:00:00.000 | 2017-02-06 23:59:59.999|
|     3     | 2017-02-07 00:00:00.000 | 2017-02-07 23:00:00.000|
----------------------------------------------------------------

How can I do it with SQL on Redshift?

Denis
  • 11
  • 3

2 Answers2

0
  1. I create a list of number to create 1 day ranges. In this case I create 100 days
  2. I keep lot of columns only for debug
  3. Create 1 day ranges new_start and new_end
  4. The base case is one the first interval and if both dates are on the same day then you dont need change anything
  5. Now on the first interval I select the original date_start same as the last interval I use date_end
  6. For the rest I use new_start and new_end - 1 second
  7. And only do that for one day ranges overlaping your original range

SQL DEMO

WITH days as  (
    SELECT a.n
    from generate_series(1, 100) as a(n)
), ranges as (
    SELECT *, (d.n::text || ' DAY')::interval as i,
           t1.date_start::date + ((d.n - 1)::text || ' DAY')::interval as new_start,
           t1.date_start::date + (d.n::text || ' DAY')::interval as new_end,

           CASE WHEN t1.date_start::date = t1.date_end::date AND d.n = 1
                THEN t1.date_start    
                WHEN t1.date_start::date < t1.date_end::date 
                THEN t1.date_start
                ELSE NULL
           END as date_start1,
           CASE WHEN t1.date_start::date = t1.date_end::date AND d.n = 1
                THEN t1.date_end
           END  date_end1
    FROM Table1 t1
    CROSS JOIN days d
)
SELECT *, CASE WHEN date_start < new_end AND date_end > new_start
               THEN 'overlap'
          END as overlap,
          CASE WHEN date_end1 IS NOT NULL 
               THEN date_start1
               WHEN date_start < new_end AND date_end > new_start
               THEN CASE WHEN date_start > new_start
                         THEN date_start
                         ELSE new_start
                    END
          END as final_start,

          CASE WHEN date_end1 IS NOT NULL 
               THEN date_end1
               WHEN date_start < new_end AND date_end > new_start
               THEN CASE WHEN date_end < new_end
                         THEN date_end
                         ELSE new_end - '1 second'::interval
                    END
          END as final_end                      
FROM ranges
WHERE date_start < new_end AND date_end > new_start
ORDER BY "id", new_start

OUTPUT

enter image description here

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • Juan, thanks for your answer! Unfortunately, Redshift doesn't support some Postgres functions, incl generate_series ( http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html ) . :( – Denis Dec 17 '17 at 20:46
  • You can still do it, but you need to pre-generate the "generate series" table. The `generate_series()` function will work, as long as it does not interact wit the data. Therefore, you can use it to generate static tables with which to join. – John Rotenstein Dec 17 '17 at 22:20
  • please amend answer or mark it as not working with redshift? – Jon Scott Dec 18 '17 at 08:32
  • @JonScott You mean me? If the problem is only `generate_series()` function you can also create the list of numbers with a query similar to [**this**](https://stackoverflow.com/questions/21425546/how-to-generate-a-range-of-numbers-between-two-numbers) – Juan Carlos Oropeza Dec 18 '17 at 13:15
0

Finally, I've done in this way. Works for 2-days continuous time ranges maximum (i.e. session started 2017-12-02, ended 2017-12-04 - will not be taken in this data set; 2017-12-02 -- 2017-12-03 is ok).

 -- Select 1-st day's interval for two-days sessions:
     SELECT sessions.date_start
            ,DATE_TRUNC('day',sessions.date_end) as date_end
       FROM sessions
      WHERE DATEDIFF(day,sessions.date_start,sessions.date_end) = 1


     UNION ALL
     -- Select 2-nd day's interval for two-days sessions:
     SELECT DATE_TRUNC('day',sessions.date_end) as date_start
            ,sessions.date_end as date_end
       FROM sessions
      WHERE DATEDIFF(day, sessions.date_start, sessions.date_end) = 1

    UNION ALL
    -- Select one-day sessions:
     SELECT sessions.date_start as date_start
            ,sessions.date_end as date_end
       FROM sessions
      WHERE DATEDIFF(day, sessions.date_start, sessions.date_end) = 0
Denis
  • 11
  • 3