0

I have a table with a series of date ranges like this:

id, start_date, end_date,
(1, '2022-01-01', '2022-01-31'),
(2, '2022-01-01', '2022-01-05'),
(3, '2022-01-03', '2022-01-06'),
(4, '2022-01-01', '2022-01-01')

I would like to be able to look at each date (from a series) and see how many of these ranges fall into each date. The date series would ideally start from the smallest value in a starting range. But starting from say 1 year ago to now would also be sufficient.

The outcome looking like this:

|    Date    | Count |
| ---------- | ----- |
| 10/07/2022 |   6   |
| 9/07/2022  |   8   |
| 8/07/2022  |  12   |
| 7/07/2022  |   5   |

We cannot use the generate_series function in Amazon Redshift. So I have generated a series like so which goes back 365 days:

SELECT  DATEADD('day', 1-n, (DATE_TRUNC('day', CURRENT_DATE))) AS generated_date
        FROM (SELECT ROW_NUMBER() OVER () AS n FROM stl_scan LIMIT 365) n
        ORDER BY generated_date DESC

Struggling to implement this generated series into my query.

Dom
  • 1
  • 1

1 Answers1

0

I usually create date sequences in Redshift with a recursive CTE like this - trying to create a date table in Redshift

I am wondering why you need a date sequence for this query. I can imagine queries that use one and queries that don't. Can you describe the query you are expecting to use?

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18
  • I was following this - https://dba.stackexchange.com/questions/208241/fastest-way-to-count-how-many-dates-ranges-cover-each-date-from-series The generate_series function in the first answer. Recreating a function that can do the same as this in Redshift is where my problem is. – Dom Jul 20 '22 at 19:34
  • That example was finding the number of ranges for every date within the series. Is this what you want? I read your post as you have a second table with dates you want to test against the table provided. I'm guessing I misread but let me know. – Bill Weiner Jul 20 '22 at 20:09
  • The recursive CTE will give you the list of dates you want but there is a faster approach if your table is very large. Kind of long to explain but I've written a white paper on this approach (comes up a bunch w/ my clients) - http://wad-design.s3-website-us-east-1.amazonaws.com/sql_limits_wp.html - this uses UNION instead of a loop join. – Bill Weiner Jul 20 '22 at 20:13
  • "That example was finding the number of ranges for every date within the series. Is this what you want?" - Yes that's what I need – Dom Jul 21 '22 at 13:17
  • Thanks that clarifies. So the recursive CTE will create the series of dates you are looking for and this can be plugged into the first query from stackexchange (Redshift doesn't support LATERAL JOINs AFAIK). If it isn't clear how to put these pieces together reach out with any confusion and I'll try to help. – Bill Weiner Jul 21 '22 at 15:56
  • The warning I'll give with the JOIN BETWEEN approach is that for large data tables and/or wide date ranges the loop join that is created will produce a very large intermediate dataset. (It makes all the incrementing date rows and then counts them.) The white paper I linked doesn't produce this large intermediate dataset but does use window functions which can be expensive. For large data table / date ranges this window function approach can be orders of magnitude faster that a massive loop join but for small cases the loop join is just fine. – Bill Weiner Jul 21 '22 at 16:01