Generalizing a CASE statement (with dynamic names for result columns?)

Question

I currently have a query like so -

SELECT 
   SUM(CASE month WHEN 'January' THEN 1 ELSE 0 END) AS "Jan", 
   SUM(CASE month WHEN 'February' THEN 1 ELSE 0 END) AS "Feb", 
   SUM(CASE month WHEN 'March' THEN 1 ELSE 0 END) AS "Mar", 
   SUM(CASE month WHEN 'April' THEN 1 ELSE 0 END) AS "April", 
   SUM(CASE month WHEN 'May' THEN 1 ELSE 0 END) AS "May", 
   SUM(CASE month WHEN 'June' THEN 1 ELSE 0 END) AS "June"   
FROM tbl
WHERE start >= '2021-01-01'
AND   start <= '2021-06-30'

This query needs to run for the last 6 months every month. For example, since the current month has not ended yet, this query needs to run from Jan 1 to June 30. How do I automate this query so I don't have to change the CASE statements or the dates in the WHERE clause every month.

The output I am expecting

Jan  Feb  Mar  Apr  May  June
2    2    3    6    1    4

I think it may be useful to use the GETDATE () function and then you subtract 1 month that way every time it arrives next month you will always have your 6 months — The Riddler, Jul 15 '21 at 19:42
you can declare some variables for that and run through an SP — The Riddler, Jul 15 '21 at 19:44
Why are you storing the month in a separate column if you also have a date column? — , Jul 15 '21 at 19:58
It can be `Apr` or `April` as long as I can run it every month without having to change a bunch of things. @ErwinBrandstetter — Aaron, Jul 15 '21 at 22:03

Erwin Brandstetter · Accepted Answer · 2021-07-15T23:13:29.513

The column month seems to be redundant. Drop it from the table. start has all the information you need.
(I'd rather not use start as column name as that's a keyword in standard SQL - even if allowed in Postgres.)

SELECT date_trunc('month', start) AS mon, count(*) AS ct
FROM   tbl
WHERE  start >= '2021-01-01'
AND    start <  '2021-07-01'
GROUP  BY 1
ORDER  BY 1;

Use date_trunc() to preserve chronological order. If you need month names in the result:

WITH cte(current_mon) AS (SELECT date_trunc('month', LOCALTIMESTAMP))
SELECT to_char(mon, 'Mon') AS month, COALESCE(data.ct, 0) AS ct
FROM   cte c
CROSS  JOIN generate_series(c.current_mon - interval '6 mon'
                          , c.current_mon - interval '1 mon' 
                          , interval '1 mon') mon
LEFT   JOIN (
   SELECT date_trunc('month', start) AS mon, count(*)::int AS ct
   FROM   tbl, cte c
   WHERE  start >= c.current_mon - interval '6 mon'
   AND    start <  c.current_mon
   GROUP  BY 1
   ) data USING (mon)
ORDER  BY mon;

db<>fiddle here

Returns one row per month, in chronological order (considering the year, too, though it's not in your output!), and truly dynamic.

month	ct
Jan	31
Feb	28
Mar	31
Apr	0
May	31
Jun	30

Note how I first build timestamps for the last six months (excluding the current one) with generate_series() in the first subquery mon. See:

Generating time series between two dates in PostgreSQL

Then LEFT JOIN to counts per month from the relevant time range. This way always returns the last 6 months, even if no rows are found at all. COALESCE makes the count 0 instead of NULL for that case. Related:

PostgreSQL: running count of rows for a query 'by minute'

Note, in particular, that it's faster to aggregate first and join later. See:

Query with LEFT JOIN not returning rows for count of 0

Using standard English month names and 3-letter-abbreviations.

Your original query produces that information in pivoted form: one month per column. But dynamic column names are not possible for a static SQL query. If you really need that, you need a 2-step flow of operation (two round trips to the server):

Build the query.
Execute it.

Well, you could prepare 12 different row types (that's the range of possible result types for your case) and use a polymorphic function to achieve it. But do you actually need the pivoted form?

OK, you asked for it ...

You want a simple call like this?

SELECT * FROM f_tbl_counts_6months(NULL::m6_jul);

It's possible. Here is a proof of concept.
But, honestly, I'd rather avoid the complication and just use the simple query above.

Create a polymorphic function:

CREATE OR REPLACE FUNCTION f_tbl_counts_6months(ANYELEMENT)
  RETURNS SETOF ANYELEMENT 
  LANGUAGE plpgsql AS
$func$
DECLARE
   _current_mon timestamp := date_trunc('month', LOCALTIMESTAMP);
BEGIN
   -- to prevent incorrect column names, input row type must match current date:
   IF right(pg_typeof($1)::text, 3) = to_char(_current_mon, 'mon') THEN
      -- all good!
   ELSE
      RAISE EXCEPTION 'Current date is %. Function requires input >>%<<'
                    , CURRENT_DATE, 'NULL::m6_' || to_char(now(), 'mon');
   END IF;

   RETURN QUERY
   SELECT a[2], a[2], a[3], a[4], a[5], a[6]
   FROM (
      SELECT ARRAY(
         SELECT COALESCE(data.ct, 0)
         FROM   generate_series(_current_mon - interval '6 mon'
                              , _current_mon - interval '1 mon'
                              , interval '1 mon') mon
         LEFT   JOIN (
            SELECT date_trunc('month', start) AS mon, count(*)::int AS ct
            FROM   tbl
            GROUP  BY 1
            ) data USING (mon)
         ORDER  BY mon
         )
      ) sub(a);
END
$func$;

And 12 composite (row) types, one for each month of the year:

CREATE TYPE m6_jan AS ("Jul" int, "Aug" int, "Sep" int, "Oct" int, "Nov" int, "Dec" int);
CREATE TYPE m6_feb AS ("Aug" int, "Sep" int, "Oct" int, "Nov" int, "Dec" int, "Jan" int);
CREATE TYPE m6_mar AS ("Sep" int, "Oct" int, "Nov" int, "Dec" int, "Jan" int, "Feb" int);
CREATE TYPE m6_apr AS ("Oct" int, "Nov" int, "Dec" int, "Jan" int, "Feb" int, "Mar" int);
CREATE TYPE m6_may AS ("Nov" int, "Dec" int, "Jan" int, "Feb" int, "Mar" int, "Apr" int);
CREATE TYPE m6_jun AS ("Dec" int, "Jan" int, "Feb" int, "Mar" int, "Apr" int, "May" int);
CREATE TYPE m6_jul AS ("Jan" int, "Feb" int, "Mar" int, "Apr" int, "May" int, "Jun" int);
CREATE TYPE m6_aug AS ("Feb" int, "Mar" int, "Apr" int, "May" int, "Jun" int, "Jul" int);
CREATE TYPE m6_sep AS ("Mar" int, "Apr" int, "May" int, "Jun" int, "Jul" int, "Aug" int);
CREATE TYPE m6_oct AS ("Apr" int, "May" int, "Jun" int, "Jul" int, "Aug" int, "Sep" int);
CREATE TYPE m6_nov AS ("May" int, "Jun" int, "Jul" int, "Aug" int, "Sep" int, "Oct" int);
CREATE TYPE m6_dec AS ("Jun" int, "Jul" int, "Aug" int, "Sep" int, "Oct" int, "Nov" int);

Then the simple function call works and returns exactly your desired result:

SELECT * FROM f_tbl_counts_6months(NULL::m6_jul);

Jan	Feb	Mar	Apr	May	Jun
31	28	31	0	31	30

Why? How? See:

Refactor a PL/pgSQL function to return the output of various SELECT queries

You are required to call with the right type. I built in a fail-safe to prevent wrong results. If you call with the wrong type, like the following call in July (currently):

SELECT * FROM f_tbl_counts_6months(NULL::m6_nov);

... the function throws an exception with instructions:

ERROR:  Current date is 2021-07-15. Function requires input >>NULL::m6_jul<<
CONTEXT:  PL/pgSQL function f_tbl_counts_6months(anyelement) line 9 at RAISE

This does not align with the output I am expecting. I have updated the question with the updated output. — Aaron, Jul 15 '21 at 20:58
I can't add 12 different CASE statements because then the report will have 12 columns. I only need columns for the last 6 months. — Aaron, Jul 15 '21 at 21:07
@Aaron: The core question is: do you need dynamic column names? That's the tricky part. (And typically not what you want in client code anyway!) The rest is not that hard. — Erwin Brandstetter, Jul 15 '21 at 21:17
I do need dynamic column names. If that is not an option, what would you suggest? — Aaron, Jul 15 '21 at 21:24

Meysam Asadi · Answer 2 · 2021-07-15T22:49:59.380

Go back 1 month from the date now : the result is the end date : DATEADD(MONTH, -1, GETDATE())

7 months back to be the start date : DATEADD(MONTH, -7, GETDATE())

To calculate the first day of the month, subtract today from itself : (DAY(CURRENT_TIMESTAMP) - 1)

get end date

    sql => select DATEADD(MONTH, -1, GETDATE()) - (DAY(CURRENT_TIMESTAMP) - 1)
    postgresql => SELECT now() - INTERVAL '1 month' - (extract(day from now()) - 1 || ' day')::INTERVAL;

get start date

    sql => select DATEADD(MONTH, -7, GETDATE()) - (DAY(CURRENT_TIMESTAMP) - 1)
    postgresql => select now() - INTERVAL '7 month' - (extract(day from now()) - 1 || ' day')::INTERVAL;

query

SELECT 
  SUM(CASE month WHEN 'January' THEN 1 ELSE 0 END) AS "Jan", 
  SUM(CASE month WHEN 'February' THEN 1 ELSE 0 END) AS "Feb", 
  SUM(CASE month WHEN 'March' THEN 1 ELSE 0 END) AS "Mar", 
  SUM(CASE month WHEN 'April' THEN 1 ELSE 0 END) AS "April", 
  SUM(CASE month WHEN 'May' THEN 1 ELSE 0 END) AS "May", 
  SUM(CASE month WHEN 'June' THEN 1 ELSE 0 END) AS "June"   
FROM dateTable
WHERE start >= now() - INTERVAL '7 month' - (extract(day from now()) - 1 || ' day')::INTERVAL
AND start <= now() - INTERVAL '1 month' - (extract(day from now()) - 1 || ' day')::INTERVAL;

postgresql demo in dbfiddle

score -1 · Answer 3 · answered Jul 15 '21 at 21:01

Try this:

DECLARE @STARTDATE AS DATE = DATEADD(d, -31, DATEADD(m, DATEDIFF(m, -1, GETDATE()) - 6, 0))
DECLARE @ENDDATE AS DATE = DATEADD(d, -1, DATEADD(m, DATEDIFF(m, -1,  GETDATE()) - 1, 0)) 

SELECT 
SUM(CASE month WHEN 'January' THEN 1 ELSE 0 END) AS "Jan", 
SUM(CASE month WHEN 'February' THEN 1 ELSE 0 END) AS "Feb", 
SUM(CASE month WHEN 'March' THEN 1 ELSE 0 END) AS "Mar", 
SUM(CASE month WHEN 'April' THEN 1 ELSE 0 END) AS "April", 
SUM(CASE month WHEN 'May' THEN 1 ELSE 0 END) AS "May", 
SUM(CASE month WHEN 'June' THEN 1 ELSE 0 END) AS "June"   
WHERE start >= @STARTDATE
AND start <= @ENDDATE

Generalizing a CASE statement (with dynamic names for result columns?)

3 Answers3

OK, you asked for it ...