1

I'm using PostgreSQL 9.3.9 to run two different queries that give different results but both group by "month-year". I was wondering how I would create a query to give me the same data in one table?

Query 1:

SELECT CONCAT(EXTRACT(MONTH FROM startedPayingDate), '-', 
            EXTRACT(YEAR FROM startedPayingDate)) AS "Month", 
 COUNT(*) AS "Total AB Paying Customers"
FROM (       
 SELECT cm.customer_id, MIN(cm.created_at) AS startedPayingDate 
 FROM customerusermap AS cm, users as u
 WHERE cm.customer_id = u.customer_id AND cm.user_id<>u.id 
 GROUP BY cm.customer_id ) AS t
GROUP BY 1, EXTRACT(MONTH FROM startedPayingDate), EXTRACT(YEAR FROM startedPayingDate)
ORDER BY EXTRACT(YEAR FROM startedPayingDate), EXTRACT(MONTH FROM startedPayingDate);

Results in something like this:

Month  | Total AB Paying Customers
---------------------------------
3-2014 | 2
4-2014 | 4

Query 2:

SELECT concat(extract(MONTH from u.created_at),'-',extract(year from u.created_at)) as "Month", 
count(u.email) as "Total SMB Paying Customers"
FROM customerusermap AS cm, users AS u 
WHERE cm.customer_id = u.customer_id AND cm.user_id = u.id AND u.paid_status = 'paying' 
GROUP by 1,extract(month from u.created_at),extract(year from u.created_at)
order by extract(year from u.created_at),extract(month from u.created_at);

Results in something like this:

Month  | Total SMB Paying Customers
-----------------------------------
2-2014 | 3
3-2014 | 8
4-2014 | 5

Desired result

I want to merge these two queries into something that results as shown and also orders by year and month (i.e. oldest to newest):

Month  | Total AB Paying Customers | Total SMB Paying Customers | Total | Cumulative
-------------------------------------------------------------------------------------
2-2014 |           0               |            3               |    3  |   3
3-2014 |           2               |            8               |    10 |   13
4-2014 |           4               |            5               |    9  |   22

Table definitions

CREATE TABLE users (
id serial NOT NULL,
firstname character varying(255) NOT NULL,
lastname character varying(255) NOT NULL,
email character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
customer_id character varying(255) DEFAULT NULL::character varying,
companyname character varying(255),
primary_user_id integer,
paid_status character varying(255),  -- updated from comment
CONSTRAINT users_pkey PRIMARY KEY (id),
CONSTRAINT primary_user_id_fk FOREIGN KEY (primary_user_id) REFERENCES users (id),
CONSTRAINT users_uuid_key UNIQUE (uuid)
)

Whereas the customerusermap table looks like:

CREATE TABLE customerusermap (
id serial NOT NULL,
user_id integer NOT NULL,
customer_id character varying(255) NOT NULL,
created_at timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT customerusermap_pkey PRIMARY KEY (id),
CONSTRAINT customerusermap_user_id_fkey FOREIGN KEY (user_id) REFERENCES users (id),
CONSTRAINT customerusermap_user_id_key UNIQUE (user_id)
);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Ashley I.
  • 137
  • 10

1 Answers1

1

General solution

The key feature would be a FULL OUTER JOIN, but handle NULL values properly:

SELECT *
     , "Total AB Paying Customers" + "Total SMB Paying Customers" AS "Total"
     , sum("Total AB Paying Customers" + "Total SMB Paying Customers")
         OVER (ORDER BY "Month") AS "Cumulative"
FROM  (
   SELECT "Month"
        , COALESCE(q1."Total AB Paying Customers", 0)  AS "Total AB Paying Customers"
        , COALESCE(q2."Total SMB Paying Customers", 0) AS "Total SMB Paying Customers"
   FROM      (<query1>) q1
   FULL JOIN (<query2>) q2 USING ("Month")
   ) sub;

Use sum() as a window function for the cumulative sum.
The additional subquery layer is only for convenience, so that we don't have to add COALESCE() more often than necessary.
The query could be simplified some more: format the month in the outer SELECT, etc.

Optimized query

Based on your added setup:

SELECT to_char(mon, 'FMMM-YYYY') AS "Month"
     , ct_ab                     AS "Total AB Paying Customers"
     , ct_smb                    AS "Total SMB Paying Customers"
     , ct_ab + ct_smb            AS "Total"
     , sum(ct_ab + ct_smb) OVER (ORDER BY mon)::int AS "Cumulative"
FROM  (
   SELECT mon, COALESCE(q1.ct_ab, 0) AS ct_ab, COALESCE(q2.ct_smb, 0) AS ct_smb
   FROM  (
      SELECT date_trunc('month', start_date) AS mon, count(*)::int AS ct_ab
      FROM  (       
         SELECT cm.customer_id, min(cm.created_at) AS start_date 
         FROM   customerusermap cm
         JOIN   users u USING (customer_id)
         WHERE  cm.user_id <> u.id 
         GROUP  BY 1
         ) t
      GROUP  BY 1
      ) q1
   FULL JOIN (
      SELECT date_trunc('month', u.created_at) AS mon, count(*)::int AS ct_smb
      FROM   customerusermap cm
      JOIN   users u USING (customer_id)
      WHERE  cm.user_id = u.id AND u.paid_status = 'paying' 
      GROUP  BY 1
      ) q2 USING (mon)
   ) sub;
ORDER  BY mon;

Major points

  • Use to_char() to format your month any way you like. And do it just once at the end. The template pattern FMMM produces month numbers without leading zero, like in your original.

  • Use date_trunc() to granulate your timestamp without time zone to month resolution (first timestamp in the month, but that makes no difference here).

  • I added ORDER BY mon to get the sort order like you commented. That works as expected since the column mon is still a timestamp (not converted to a string (text) yet.

  • Since u.email is defined NOT NULL, count(*) does the same as count(u.email) in this context, a bit cheaper.

  • Use explicit JOIN syntax. Same performance, but much clearer.

  • I cast aggregated counts to integer. That's totally optional (and assuming you won't have an integer overflow). So you have all integer numbers in the result instead of bigint and numeric

Compare to your original, you will find this is substantially shorter and a faster.

If performance is important be sure to have indexes on relevant columns. If there are many entries in users to one entry in customerusermap, there are more sophisticated options with a JOIN LATERAL to make both your queries faster:

If you want to include months without any activity, LEFT JOIN to a complete list of months instead. Example:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Oops disregard last comment - this works! How would I sort by date though? ie 10-2012, 11-2012, 12-2012, 1-2013. right now it is sorted by month-year so it is giving me 8-2013, 8-2014, 8-2015, 9-2013 etc – Ashley I. Oct 05 '15 at 19:29
  • As I mentioned: do the formatting *after* everything else. Work with the `date` first, that's simpler and faster - and you get the order right. – Erwin Brandstetter Oct 05 '15 at 19:38
  • Hey Erwin, thanks for the help. just to clarify, what do you mean by work with the `date` first? Should I separate the "Month" to be "Month" and "Year" rather than concatenating? – Ashley I. Oct 05 '15 at 20:59
  • @AshleyI.: If you edit your question to provide the requested information (essential to every such question), I'll give you a superior complete query. – Erwin Brandstetter Oct 05 '15 at 21:08
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91500/discussion-between-ashley-i-and-erwin-brandstetter). – Ashley I. Oct 06 '15 at 14:30