Pivot table using crosstab and count

Question

I have to display a table like this:

Year	Month	Delivered	Not delivered	Not Received
2021	Jan	10	86	75
2021	Feb	13	36	96
2021	March	49	7	61
2021	Apr	3	21	72

Using raw data generated by this query:

SELECT 
    year,
    TO_CHAR( creation_date, 'Month') AS month,
    marking,
    COUNT(*) AS count 
FROM invoices
GROUP BY 1,2,3

I have tried using crosstab() but I got error:

SELECT * FROM crosstab('
    SELECT 
        year,
        TO_CHAR( creation_date, ''Month'') AS month,
        marking,
        COUNT(*) AS count 
    FROM invoices
    GROUP BY 1,2,3
') AS ct(year text, month text, marking text)

I would prefer to not manually type all marking values because they are a lot.

ERROR:  invalid source data SQL statement
DETAIL:  The provided SQL must return 3 columns: rowid, category, and values.

These two answers show that it gets very difficult to do N dynamic columns especially if you want N *named* dynamic columns... [A1](https://stackoverflow.com/a/15514334/3073340) [A2](https://stackoverflow.com/a/30502937/3073340) good luck! For my own cases, I've only learned how to do this by utilizing [dbt](https://www.getdbt.com/) which enables jinja templating - my tangentially related [answer](https://stackoverflow.com/questions/69979720/snowflake-pivot-for-dynamic-columns-with-dbt-macro) but with slightly different syntax (snowflake). — sgdata, Dec 31 '21 at 16:38
@sgoley. I agree with you that getting N named dynamic columns is a bit difficult issue with pg. I try to propose in my answer a full dynamic solution which is an alternative to the `crosstab` solutions you mention in your comment. — Edouard, Jan 01 '22 at 11:26

Edouard · Accepted Answer · 2022-01-04T21:22:19.013

1. Static solution with a limited list of marking values :

SELECT year
     , TO_CHAR( creation_date, 'Month') AS month
     , COUNT(*) FILTER (WHERE marking = 'Delivered') AS Delivered
     , COUNT(*) FILTER (WHERE marking = 'Not delivered') AS "Not delivered"
     , COUNT(*) FILTER (WHERE marking = 'Not Received') AS "Not Received"
FROM invoices
GROUP BY 1,2

2. Full dynamic solution with a large list of marking values :

This proposal is an alternative solution to the crosstab solution as proposed in A and B.

The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :

Starting from your query which counts the number of rows per year, month and marking value :

Using the jsonb_object_agg function, the resulting rows are first aggregated by year and month into jsonb objects whose jsonb keys correspond to the marking values and whose jsonb values correspond to the counts.
the resulting jsonb objects are then converted into records using the jsonb_populate_record function and the dedicated composite type.

First we dynamically create a composite type which corresponds to the ordered list of marking values :

CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
DECLARE
  column_list text ;
BEGIN
  SELECT string_agg(DISTINCT quote_ident(marking) || ' bigint', ',' ORDER BY quote_ident(marking) || ' bigint' ASC)
    INTO column_list
    FROM invoices ;
  
  EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
  EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
END ;
$$ ;

CALL create_composite_type() ;

Then the expected result is provided by the following query :

SELECT a.year
     , TO_CHAR(a.year_month, 'Month') AS month
     , (jsonb_populate_record( null :: composite_type
                             , jsonb_object_agg(a.marking, a.count)
                             )
       ).*
 FROM
    ( SELECT year
           , date_trunc('month', creation_date) AS year_month
           , marking
           , count(*) AS count
        FROM invoices AS v
       GROUP BY 1,2,3
    ) AS a
GROUP BY 1,2
ORDER BY month

Obviously, if the list of marking values may vary in time, then you have to recall the create_composite_type() procedure just before executing the query. If you don't update the composite_type, the query will still work (no error !) but some old marking values may be obsolete (not used anymore), and some new marking values may be missing in the query result (not displayed as columns).

See the full demo in dbfiddle.

I have to use cross tab because the marking values are a lot and I prefer something dynamic as a pivot table. — sparkle, Dec 24 '21 at 20:05
This proposed solution is fully dynamic, it can manage a large number of marking values which can vary in time, and it avoids to use a pivot table, so I don't really see what's wrong with it. — Edouard, Jan 03 '22 at 16:34
@Edouard, I came up with several related JSON methods, thinking of turning it into a full Q&A. The record-set composite type is the key to the kingdom, esp. as the JSON functions are very forgiving about mismatches. — L. Rodgers, Sep 26 '22 at 02:12

Erwin Brandstetter · Answer 2 · 2022-03-30T02:11:47.693

You need to generate the crosstab() call dynamically. But since SQL does not allow dynamic return types, you need a two-step workflow:

Generate query
Execute query

If you are unfamiliar with crosstab(), read this first:

PostgreSQL Crosstab Query

It's odd to generate the month from creation_date, but not the year. To simplify, I use a combined column year_month instead.

Query to generate the crosstab() query:

SELECT format(
$f$SELECT * FROM crosstab(
   $q$
   SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
        , marking
        , COUNT(*) AS ct
   FROM   invoices
   GROUP  BY date_trunc('month', creation_date), marking
   ORDER  BY date_trunc('month', creation_date)  -- optional
   $q$
 , $c$VALUES (%s)$c$
   ) AS ct(year_month text, %s);
$f$, string_agg(quote_literal(sub.marking), '), (')
   , string_agg(quote_ident  (sub.marking), ' int, ') || ' int'
)
FROM  (SELECT DISTINCT marking FROM invoices ORDER BY 1) sub;

If the table invoices is big with only few distinct values for marking (which seems likely) there are faster ways to get distinct values. See:

Optimize GROUP BY query to retrieve latest row per user

Generates a query of the form:

SELECT * FROM crosstab(
   $q$
   SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
        , marking
        , COUNT(*) AS ct
   FROM   invoices
   GROUP  BY date_trunc('month', creation_date), marking
   ORDER  BY date_trunc('month', creation_date)  -- optional
   $q$
 , $c$VALUES ('Delivered'), ('Not Delivered'), ('Not Received')$c$
   ) AS ct(year_month text, "Delivered" int, "Not Delivered" int, "Not Received" int);

The simplified query does not need "extra columns. See:

Pivot on Multiple Columns using Tablefunc

Note the use date_trunc('month', creation_date) in GROUP BY and ORDER BY. This produces a valid sort order, and faster, too. See:

Also note the use of dollar-quotes to avoid quoting hell. See:

Insert text with single quotes in PostgreSQL

Months without entries don't show up in the result, and no markings for an existing month show as NULL. You can adapt either if need be. See:

Join a count query on generate_series() and retrieve Null values as '0'

Then execute the generated query.

db<>fiddle here (reusing Edouard's fiddle, kudos!)

See:

Execute a dynamic crosstab query

In psql

In psql you can use \qexec to immediately execute the generated query. See:

Simulate CREATE DATABASE IF NOT EXISTS for PostgreSQL?

In Postgres 9.6 or later, you can also use the meta-command \crosstabview instead of crosstab():

test=> SELECT to_char(date_trunc('month', creation_date), 'YYYY_Month') AS year_month
test->      , marking
test->      , COUNT(*) AS count
test-> FROM   invoices
test-> GROUP  BY date_trunc('month', creation_date), 2
test-> ORDER  BY date_trunc('month', creation_date)\crosstabview

   year_month   | Not Received | Delivered | Not Delivered 
----------------+--------------+-----------+---------------
 2020_January   |            1 |         1 |             1
 2020_March     |              |         2 |             2
 2021_January   |            1 |         1 |             2
 2021_February  |            1 |           |              
 2021_March     |              |         1 |              
 2021_August    |            2 |         1 |             1
 2022_August    |              |         2 |              
 2022_November  |            1 |         2 |             3
 2022_December  |            2 |           |              
(9 rows)

Note that \crosstabview - unlike crosstab() - does not support "extra" columns. If you insist on separate year and month columns, you need crosstab().

See:

How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?

Pivot table using crosstab and count

2 Answers2

In psql

Linked