1

If this is my query:

select
 (min(timestamp))::date as date,
 (count(distinct(user_id)) as user_id_count
 (row_number() over (order by signup_day desc)-1) as days_since
from
  data.table
where
  timestamp >= current_date - 3
group by
  timestamp
order by
  timestamp asc;

And these are my results

date        | user_id_count   | days_since
------------+-----------------+-------------
2018-01-22  | 3               | 1
2018-01-23  | 5               | 0

How can I get it the table to show (where the user ID count is 0?):

date        | user_id_count   | days_since
------------+-----------------+-------------
2018-01-21  | 0               | 0
2018-01-22  | 3               | 1
2018-01-23  | 5               | 0
eric-o
  • 65
  • 1
  • 5
  • not sure if this will work, have you tried `coalesce(count(...), 0) as user_id_count` ? – cowbert Jan 24 '18 at 03:11
  • AFAIK [`generate_series`](https://stackoverflow.com/q/17282276/479863) does work on Redshift if you use the integer form so you can join to a `generate_series` that produces a number of *days* and add those days to `current_date` to get the dates you're after. – mu is too short Jan 24 '18 at 03:58

2 Answers2

2

You need to generate the dates. In Postgres, generate_series() is the way to go:

select g.ts as dte,
       count(distinct t.user_id) as user_id_count
       row_number() over (order by signup_day desc) - 1) as days_since
from generate_series(current_date::timestamp - interval '3 day', current_date::timestamp, interval '1 day') g(ts) left join
     data.table t
     on t.timestamp::date = g.ts
group by t.ts
order by t.ts;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • That sounds like it would be perfect! My database keeps returning "function generate_series(timestamp without time zone, timestamp without time zone, interval) does not exist" – eric-o Jan 24 '18 at 03:13
  • @user2928789 . . . It definitely exists (http://rextester.com/WIDLY47720), but you could also list the values individually. – Gordon Linoff Jan 24 '18 at 03:15
  • So am I right in understanding that generate_series does not work well on Redshift? – eric-o Jan 24 '18 at 03:28
  • @user2928789 . . . You are correct. The question is tagged "postgres" and the title explicitly mentions "Postgres". – Gordon Linoff Jan 24 '18 at 03:31
  • My apologies, I thought Redshift ran Postgres 8.0 – eric-o Jan 24 '18 at 03:33
0

You have to create a "calendar" table with dates and left join your aggregated result like that:

with 
aggregated_result as (
    select ...
)
select
 t1.date
,coalesce(t2.user_id_count,0) as user_id_count
,coalesce(t2. days_since,0) as days_since
from calendar t1
left join aggregated_result t2
using (date)

more on creating calendar table: How do I create a dates table in Redshift?

AlexYes
  • 4,088
  • 2
  • 15
  • 23