Good evening dear Stack overflow community. This is my first question here.
I have the following issue: I need a query to count distinct values in a rolling date range (3 days) for each country.
I have done some research and found the following discussion here [1]: Query for count of distinct values in a rolling date range.
For the problem I face I need grouping not only by date, but also by country. Please consider the following input table:
Date | Country | |
---|---|---|
1/1/12 | DE | de1@example.com |
1/1/12 | FRA | fra1@example.com |
1/1/12 | SPA | spa1@example.com |
1/2/12 | DE | de1@example.com |
1/2/12 | DE | de2@example.com |
1/3/12 | SPA | spa1@example.com |
1/3/12 | SPA | spa2@example.com |
1/3/12 | FRA | fra2@example.com |
1/4/12 | SPA | spa1@example.com |
1/4/12 | FRA | fra2@example.com |
1/4/12 | FRA | fra3@example.com |
1/4/12 | SPA | spa3@example.com |
The expect outcome with counted distinct emails will be the following:
Date | Country | |
---|---|---|
1/1/12 | DE | 1 |
1/1/12 | FRA | 1 |
1/1/12 | SPA | 1 |
1/2/12 | DE | 2 |
1/2/12 | FRA | 1 |
1/2/12 | SPA | 1 |
1/3/12 | SPA | 2 |
1/3/12 | DE | 2 |
1/3/12 | FRA | 2 |
1/4/12 | SPA | 3 |
1/4/12 | FRA | 2 |
1/4/12 | DE | 2 |
I was trying to modify the solution suggested in the above-mentioned discussion and modify the following by adding country in selected columns and in grouping.
SELECT date
,(SELECT count(DISTINCT email)
FROM tbl
WHERE date BETWEEN g.date - 2 AND g.date
) AS dist_emails
FROM (SELECT generate_series(timestamp '2012-01-01'
, timestamp '2012-01-06'
, interval '1 day')::date) AS g(date)
Unfortunately the updated query does not work, as country is not recognised and it errors.
SELECT date, country,
,(SELECT count(DISTINCT email)
FROM tbl
WHERE date BETWEEN g.date - 2 AND g.date
) AS dist_emails
FROM (SELECT generate_series(timestamp '2012-01-01'
, timestamp '2012-01-06'
, interval '1 day')::date) AS g(date)
GROUP BY 1,2
Will be very grateful for your advice and sharing your expertise on the ways this can be fixed.