4

I need to create an empty time table series for a report so I can left join activity from several tables to it. Every hour of the day does not necessarily have data, but I want it to show null or zero for inactivity instead of omitting that hour of the day.

In later versions of Postgres (post 8.0.2), this is easy in several ways:

SELECT unnest(array[0,1,2,3,4...]) as numbers

OR

CROSS JOIN (select generate_series as hours from generate_series(now()::timestamp, now()::timestamp + interval '1 day', '1 hour'::interval )) date_series

Redshift can run some of these commands, but throws an error when you attempt to run it in conjunction with any of the tables.

WHAT I NEED:

A reliable way to generate a series of numbers (e.g. 0-23) as a subquery that will run on redshift (uses postgres 8.0.2).

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Dharam
  • 153
  • 1
  • 1
  • 6
  • http://stackoverflow.com/a/34167753/3019685 – systemjack Aug 02 '16 at 20:40
  • Because Redshift, is not versioned and thus the older version can not ever be used by anyone I can't see why this question should stay open. I'm voting to close. Redshift now supports this. – Evan Carroll Jul 05 '18 at 20:48
  • Evan Carroll, Redshift does not support cross join into subquery with generate series. It shows error "Specified types or functions (one per INFO message) not supported on Redshift tables.;" – user2455668 Jul 24 '18 at 09:12
  • @EvanCarroll is the support for this documented somewhere? I still see generate_series listed as "unsupported" on the redshift docs. We migrated to bigquery so I can't test it myself. – Dharam Jul 20 '20 at 23:21
  • Does this answer your question? [sequence number generation function in AWS redshift](https://stackoverflow.com/questions/22643338/sequence-number-generation-function-in-aws-redshift) – Steve Chambers Mar 02 '21 at 09:01
  • no @SteveChambers the question is very similar, but less specific. The existing workarounds and proposed solutions are all painful to use, since none of them are reliable or versatile. I haven't used redshift in a year or two, so it's possible they've added functionality to address this. – Dharam Mar 15 '21 at 16:32

4 Answers4

13

As long as you have a table that has more rows than your required series has numbers, this is what has worked for me in the past:

select
    (row_number() over (order by 1)) - 1 as hour
from
    large_table
limit 24
;

Which returns numbers 0-23.

moertel
  • 1,523
  • 15
  • 18
2

Unfortunately, Amazon Redshift does not allow use of generate_series() for table functions. The workaround seems to be creating a table of numbers.

See also:

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • 1
    Creating a table of numbers irks me in a very terrible way. It feels like writing the numbers out one by one in a python script. It's just wrong. – Dharam Aug 03 '16 at 23:09
  • You could create a `date` table, with columns such as `DayOfWeek`, `PublicHoliday`, `Month`, `Quarter`, etc. This is a common practice to extract useful information from dates. Then, you could re-use this table as a `Numbers` table without feeling so irksome. – John Rotenstein Aug 04 '16 at 12:10
2

Recursion was released for Redshift in April 2021. Now that recursion is possible in Redshift. You can generate series of numbers (or even table) with below code

with recursive numbers(NUMBER) as
(
select 1 UNION ALL
select NUMBER + 1 from numbers where NUMBER < 28
)
Pirate X
  • 3,023
  • 5
  • 33
  • 60
1

I'm not a big fan of querying a system table just to get a list of row numbers. If it's something constant and small enough like hours of a day, I would go with plain old UNION ALL:

WITH 
  hours_in_day AS (
    SELECT 0 AS hour
    UNION ALL SELECT 1
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    ...
    UNION ALL SELECT 23
  )

And then joinhours_in_day to whatever you want to.

pcv
  • 2,121
  • 21
  • 25
  • I've actually had better luck querying a very small table and selecting row_number() over (). Redshift doesn't play nice with repeated UNION ALL sub queries, and even for something as small as hours of the day, we've seen better performance with row_number. – Dharam Feb 12 '18 at 20:42
  • [This answer](https://stackoverflow.com/questions/22643338/sequence-number-generation-function-in-aws-redshift#answer-35500173) also describes how this can be adapted with `CROSS JOIN`s to generate far more numbers. – Steve Chambers Mar 02 '21 at 09:02