Tested with PostgreSQL. You also tagged Redshift, which forked at Postgres 8.2, a long time ago. There may be discrepancies ..
Since you seem to be happy with standard ISO format, a simple cast to date would be most efficient:
SELECT count(DISTINCT userid) AS "Count"
, download_time::date AS "Download_Day"
FROM tbl
WHERE download_time >= CURRENT_DATE - 7
AND download_time < CURRENT_DATE
GROUP BY 2;
db<>fiddle here
CURRENT_DATE
is standard SQL and works for both Postgres and Redshift. Related:
About the "last 7 days": I took the last 7 whole days (excluding today - necessarily incomplete), with syntax that can use a plain index on (download_time)
. Related:
Ideally, you have a composite index on (download_time, userid)
(and fulfill some preconditions) to get very fast index-only scans. See:
count(DISTINCT ...)
is typically slow. For big tables with many duplicates, there are faster techniques. Disclose your exact setup and cardinalities if you need to optimize performance.
If the actual data type is timestamptz
, not just timestamp
, you also need to define the time zone defining day boundaries. See:
About the optional short syntax GROUP BY 2
:
About capitalization of identifiers: