1

We have the following activity table and would like to query it to get the number of unique users for each month and the previous month. The date field (createdat) is a timestamp. The query needs to work in PostgreSQL.

Activity table:

|   id   | userid |        createdat        |    username    |
|--------|--------|-------------------------|----------------|
| 1d658a | 4957f3 | 2016-12-06 21:16:35:942 | Tom Jones      |
| 3a86e3 | 684edf | 2016-12-03 21:16:35:943 | Harry Smith    |
| 595756 | 582107 | 2016-12-26 21:16:35:944 | William Hanson |
| 2c87fe | 784723 | 2016-12-07 21:16:35:945 | April Cordon   |
| 32509a | 4957f3 | 2016-12-20 21:16:35:946 | Tom Jones      |
| 72e703 | 582107 | 2017-01-01 21:16:35:947 | William Hanson |
| 6d658a | 582107 | 2016-12-06 21:16:35:948 | William Hanson |
| 5c077c | 5934c4 | 2016-12-06 21:16:35:949 | Sandra Holmes  |
| 92142b | 57ea5c | 2016-12-15 21:16:35:950 | Lucy Lawless   |
| 3dd0a6 | 5934c4 | 2016-12-04 21:16:35:951 | Sandra Holmes  |
| 43509a | 4957f3 | 2016-11-20 21:16:35:946 | Tom Jones      |
| 85142b | 57ea5c | 2016-11-15 21:16:35:950 | Lucy Lawless   |
| 7c87fe | 784723 | 2017-1-07 21:16:35:945  | April Cordon   |
| 9c87fe | 784723 | 2017-2-07 21:16:35:946  | April Cordon   |

Results:

|  Month   | UserThis Month | UserPreviousMonth |
|----------|----------------|-------------------|
| Dec 2016 |              6 |                 2 |
| Jan 2017 |              2 |                 6 |
| Feb 2017 |              1 |                 2 |
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Leo Jones
  • 165
  • 1
  • 12

3 Answers3

1

You can try this query. to_char to get MON YYYY, You can try to write a subquery with lag windows function to get UserPreviousMonth count.

SELECT * 
FROM   (SELECT To_char(createdat, 'MON YYYY') Months, 
               Count(DISTINCT username) UserThisMonth, 
               Lag(Count(DISTINCT username)) OVER ( 
                   ORDER BY Date_part('year', createdat), 
                            Date_part('month',createdat) 
                 ) UserPreviousMonth 
        FROM   t 
        GROUP  BY Date_part('year', createdat), 
                  To_char(createdat, 'MON YYYY'), 
                  Date_part('month', createdat)) t 
WHERE  userpreviousmonth IS NOT NULL 

sqlfiddle:http://sqlfiddle.com/#!15/45e52/2

|   months | userthismonth | userpreviousmonth |
|----------|---------------|-------------------|
| DEC 2016 |             6 |                 2 |
| JAN 2017 |             2 |                 6 |
| FEB 2017 |             1 |                 2 |

EDIT

Types of Dec 2016 and Jan 2017 ... must string, because DateTime need a full date like 2017-01-01. If you need to be sorted and used on the graph I will suggest you sort on this query years and months columns, then make date string on front-end.

SELECT * 
FROM   (SELECT Date_part('year', createdat) years, 
               Date_part('month', createdat) months,
               Count(DISTINCT username) UserThisMonth, 
               Lag(Count(DISTINCT username)) OVER ( 
                   ORDER BY Date_part('year', createdat), 
                            Date_part('month',createdat) 
                 ) UserPreviousMonth 
        FROM  user_activity 
        GROUP  BY Date_part('year', createdat), 
                  Date_part('month', createdat)) t 
WHERE  userpreviousmonth IS NOT NULL 

sqlfiddle:http://sqlfiddle.com/#!15/2da2b/4

| years | months | userthismonth | userpreviousmonth |
|-------|--------|---------------|-------------------|
|  2016 |     12 |             6 |                 2 |
|  2017 |      1 |             2 |                 6 |
|  2017 |      2 |             1 |                 2 |
D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • Thank you very much! The query works but I forgot to mention that the "Month" field in the results table needs to be a "date" data type so it can be sorted and used on the graph. And createdat is an epoch timestamp. – Leo Jones Jun 17 '18 at 01:42
  • @LeoJones Types of `Dec 2016` and `Jan 2017` ... must string, because DateTime need a full date like `2017-01-01` ... , If you need to be sorted and used on the graph I will suggest you used my edit answer. – D-Shih Jun 17 '18 at 08:04
  • Thank you for the edit! I used date_trunc() as Erwin suggested and got the dates. – Leo Jones Jun 19 '18 at 07:11
1

Edit: Shamelessly using @D-Shih's superior method of generating year/month combinations.

A couple of solutions:

WITH ua AS (
  SELECT 
    TO_CHAR(createdate, 'YYYYMM') AS year_month,
    COUNT(DISTINCT userid) distinct_users
  FROM user_activity
  GROUP BY
    TO_CHAR(createdate, 'YYYYMM')
)
SELECT * FROM (
  SELECT 
    TO_DATE(ua.year_month || '01', 'YYYYMMDD') 
        + INTERVAL '1 month' 
        - INTERVAL '1 day' 
    AS month_end,
    ua.distinct_users,
    LAG(ua.distinct_users) OVER (ORDER BY ua.year_month) distinct_users_last_month
  FROM ua
) uas WHERE uas.distinct_users_last_month IS NOT NULL
ORDER BY month_end DESC;

No windowing required:

WITH ua AS (
  SELECT 
    TO_CHAR(createdate, 'YYYYMM') AS year_month,
    TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM') AS last_month,
    COUNT(DISTINCT userid) AS distinct_users
  FROM user_activity
  GROUP BY
    TO_CHAR(createdate, 'YYYYMM'),
    TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM')
)
SELECT 
  TO_DATE(ua1.year_month || '01', 'YYYYMMDD') 
        + INTERVAL '1 month' 
        - INTERVAL '1 day' 
    AS month_end,
  ua1.distinct_users,
  ua2.distinct_users AS last_distinct_users
FROM 
  ua ua1 LEFT OUTER JOIN ua ua2 
    ON ua1.year_month = ua2.last_month
WHERE ua2.distinct_users IS NOT NULL
ORDER BY ua1.year_month DESC;

DDL:

CREATE TABLE user_activity (
  id varchar(50),
  userid varchar(50),
  createdate timestamp,
  username varchar(50)
);
COMMIT;

Data:

INSERT INTO user_activity VALUES ('1d658a','4957f3','20161206 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('3a86e3','684edf','20161203 21:16:35'::timestamp,'Harry Smith');
INSERT INTO user_activity VALUES ('595756','582107','20161226 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('2c87fe','784723','20161207 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('32509a','4957f3','20161220 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('72e703','582107','20170101 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('6d658a','582107','20161206 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('5c077c','5934c4','20161206 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('92142b','57ea5c','20161215 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('3dd0a6','5934c4','20161204 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('43509a','4957f3','20161120 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('85142b','57ea5c','20161115 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('7c87fe','784723','20170107 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('9c87fe','784723','20170207 21:16:35'::timestamp,'April Cordo');
COMMIT;
Zerodf
  • 2,208
  • 18
  • 26
  • Thank you very much! The first query works but I forgot to mention that the "Month" field in the results table needs to be a "date" data type so it can be sorted and used on the graph. The second query gave this error msg, "Interval values with month or year parts are not supported." – Leo Jones Jun 17 '18 at 01:41
  • I have refactored both queries to give you a month end date instead of the string formatted YYYYMM. Quick question on the second query, what version of PostgreSQL are you using? – Zerodf Jun 17 '18 at 23:45
  • We use version 8.0.2. Thanks! – Leo Jones Jun 19 '18 at 19:20
1

Fastest and simplest with date_trunc(). Use to_char() once to display the month in preferred format:

WITH cte AS (
   SELECT date_trunc('month', createdat) AS mon
        , count(DISTINCT username) AS ct
   FROM   activity
   GROUP  BY 1
   )
SELECT to_char(t1.mon, 'MON YYYY') AS month
     , t1.ct AS users_this_month
     , t2.ct AS users_previous_month
FROM        cte t1
LEFT   JOIN cte t2 ON t2.mon = t1.mon - interval '1 mon'
ORDER  BY t1.mon;

db<>fiddle here

You commented:

the "Month" field in the results table needs to be a "date" data type so it can be sorted and used on the graph.

For this, simply cast in the final SELECT:

SELECT t1.mon::date AS month ...

Grouping and ordering by a (truncated) timestamp value is more efficient (and reliable) than by multiple values or a text representation.

The result includes the first month ('NOV 2016' in your demo), showing NULL for users_previous_month - like for any previous month without entries. You might want to display 0 instead or drop the row ...

Related:

Aside: usernames in the form of "Tom Jones" are typically not unique. You'll want to operate with a unique ID instead.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228