4

Getting date list in a range in PostgreSQL shows how to get a date range in PostgreSQL. However, Redshift does not support generate_series():

ans=> select (generate_series('2012-06-29', '2012-07-03', '1 day'::interval))::date;
ERROR:  function generate_series("unknown", "unknown", interval) does not exist
HINT:  No function matches the given name and argument types. You may need to add explicit type casts.

Is there way to replicate what generate_series() does in Redshift?

Scott Borden
  • 177
  • 1
  • 2
  • 14
  • 1
    Possible duplicate of [How do I create a dates table in Redshift?](https://stackoverflow.com/questions/47230534/how-do-i-create-a-dates-table-in-redshift) – AlexYes Jan 25 '18 at 13:08

3 Answers3

15

a hack, but works:

use a table with many many rows, and a window function to generate the series

this works as long as you are generating a series that is smaller than the number of rows in the table you're using to generate the series

WITH x(dt) AS (SELECT '2016-01-01'::date)
SELECT 
    dateadd(
        day, 
        COUNT(*) over(rows between unbounded preceding and current row) - 1, 
    dt)
FROM users, x 
LIMIT 100

the initial date 2016-01-01 controls the start date, and the limit controls the number of days in the generated series.

Update: * Will only run on the leader node

Redshift has partial support for the generate_series function but unfortunately does not mention it in their documentation.

This will work and is the shortest & most legible way of generating a series of dates as of this date (2018-01-29):

SELECT ('2016-01-01'::date + x)::date 
FROM generate_series(1, 100, 1) x
Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • 3
    `generate_series` will work when run on the leader node. However, when joining that query with actual tables it will try to be executed on other nodes and fail. From Redshift [documentation on unsupported PostgreSQL features](https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-functions.html): _"some unsupported functions will not return an error when run on the leader node"_ – mgab Dec 29 '20 at 13:27
  • Yes, you are right. It only works in single node clusters. I'll update the answer with your comment – Haleemur Ali Dec 29 '20 at 14:40
6

One option if you don't want to rely on any existing tables is to pre-generate a series table filled with a range of numbers, one for each row.

create table numbers as (
  select
          p0.n
          + p1.n*2
          + p2.n * power(2,2)
          + p3.n * power(2,3)
          + p4.n * power(2,4)
          + p5.n * power(2,5)
          + p6.n * power(2,6)
          + p7.n * power(2,7)
          + p8.n * power(2,8)
          + p9.n * power(2,9)
          + p10.n * power(2,10)
          as number
        from
          (select 0 as n union select 1) p0,
          (select 0 as n union select 1) p1,
          (select 0 as n union select 1) p2,
          (select 0 as n union select 1) p3,
          (select 0 as n union select 1) p4,
          (select 0 as n union select 1) p5,
          (select 0 as n union select 1) p6,
          (select 0 as n union select 1) p7,
          (select 0 as n union select 1) p8,
          (select 0 as n union select 1) p9,
          (select 0 as n union select 1) p10
  order by 1
);

This will create a table with numbers from 0 to 2^10, if you need more numbers, just add more clauses :D

Once you have this table, you can join to it as a substitute for generate_series

with date_range as (select 
   '2012-06-29'::timestamp as start_date , 
   '2012-07-03'::timestamp as end_date
)
select
    dateadd(day, number::int, start_date)
from date_range
inner join numbers on number <= datediff(day, start_date, end_date)
michael_erasmus
  • 906
  • 1
  • 9
  • 17
1

@michael_erasmus It's interesting, and I make a change for maybe better performance.

CREATE OR REPLACE VIEW v_series_0_to_1024 AS SELECT
  p0.n 
  | (p1.n << 1) 
  | (p2.n << 2) 
  | (p3.n << 3) 
  | (p4.n << 4) 
  | (p5.n << 5) 
  | (p6.n << 6) 
  | (p7.n << 7) 
  | (p8.n << 8) 
  | (p9.n << 9)
  as number
from
  (select 0 as n union select 1) p0,
  (select 0 as n union select 1) p1,
  (select 0 as n union select 1) p2,
  (select 0 as n union select 1) p3,
  (select 0 as n union select 1) p4,
  (select 0 as n union select 1) p5,
  (select 0 as n union select 1) p6,
  (select 0 as n union select 1) p7,
  (select 0 as n union select 1) p8,
  (select 0 as n union select 1) p9
order by number

Last 30 days date series:

select dateadd(day, -number, current_date) as dt from v_series_0_to_1024 where number < 30