Get the count of distinct userids for last couple of days

Question

Let's say the last 7 days for this table:

Userid   Download time
Rab01    2020-04-29 03:28
Klm01    2020-04-29 04:01
Klm01    2020-04-30 05:10
Rab01    2020-04-29 12:14
Osa_3    2020-04-25 09:01

Following is the required output:

Count  Download_time
1      2020-04-25
2      2020-04-29
1      2020-04-30

For `Download_time` , the count should be 1 right? Is that a typo? — Arun Palanisamy, Jun 01 '20 at 13:55

Erwin Brandstetter · Accepted Answer · 2020-06-02T16:53:33.293

Tested with PostgreSQL. You also tagged Redshift, which forked at Postgres 8.2, a long time ago. There may be discrepancies ..

Since you seem to be happy with standard ISO format, a simple cast to date would be most efficient:

SELECT count(DISTINCT userid) AS "Count"
     , download_time::date AS "Download_Day"
FROM   tbl
WHERE  download_time >= CURRENT_DATE - 7
AND    download_time <  CURRENT_DATE
GROUP  BY 2;

db<>fiddle here

CURRENT_DATE is standard SQL and works for both Postgres and Redshift. Related:

How do I determine the last day of the previous month using PostgreSQL?

About the "last 7 days": I took the last 7 whole days (excluding today - necessarily incomplete), with syntax that can use a plain index on (download_time). Related:

Ideally, you have a composite index on (download_time, userid) (and fulfill some preconditions) to get very fast index-only scans. See:

Is a composite index also good for queries on the first field?

count(DISTINCT ...) is typically slow. For big tables with many duplicates, there are faster techniques. Disclose your exact setup and cardinalities if you need to optimize performance.

If the actual data type is timestamptz, not just timestamp, you also need to define the time zone defining day boundaries. See:

Ignoring time zones altogether in Rails and PostgreSQL

About the optional short syntax GROUP BY 2:

Select first row in each GROUP BY group?

About capitalization of identifiers:

Are PostgreSQL column names case-sensitive?

now()::date doesn't work in redshift ....getdate() will work here. — ashwini571, Jun 02 '20 at 15:50
@ashwini571: I switched to `CURRENT_DATE`, which works for both. See above. — Erwin Brandstetter, Jun 02 '20 at 16:54

Slava Rozhnev · Answer 2 · 2020-06-01T14:39:46.687

0

You can use date_trunc function for get day only part from datetime and use it for grouping.

The query may be next:

SELECT 
    count(distinct Userid) as Count, -- get unuque users count
    to_char(date_trunc('day', Download_time), 'YYYY-MM-DD') AS Download_Day -- convert time do day
FROM table
WHERE DATE_PART('day', NOW() - Download_time) < 7 -- last 7 days
GROUP BY Download_Day; -- group by day

Fiddle

edited Jun 01 '20 at 14:39

answered Jun 01 '20 at 14:12

Slava Rozhnev

9,510
6
23
39

Get the count of distinct userids for last couple of days

2 Answers2