Querying row counts segregated by date ranges

Question

I've got a PostgreSQL 9.2.1 database, where I'm attempting and failing to compose a SQL query which will show me the count of distinct tests (testname) which failed (current_status='FAILED' and showing 0 if there were no failures), segregated by month (last_update). Here's the table definition:

                                       Table "public.tests"
     Column     |            Type             |                          Modifiers                          
----------------+-----------------------------+-------------------------------------------------------------
 id             | bigint                      | not null default nextval('tests_id_seq'::regclass)
 testname       | text                        | not null
 last_update    | timestamp without time zone | not null default now()
 current_status | text                        | not null

What I'd like to get back from that is something like this:

 testname    | Jan2012  | Feb2012  | Mar2012  | Apr2012  | May2012   | Jun2012   | Jul2012   | Aug2012   | Sep2012   | Oct2012   | Nov2012   | Dec2012
-------------+-----------------------------------------------------------------------------------------------------------------------------------------
 abq         |   2      |   5      |   2      |   0      |   7       |  4        |   8       |   0       |     6     |   15      |  1        |  0
 bar         |   0      |   0      |   2      |   0      |   9       |  8        |   8       |   2       |     6     |   15      |  1        |  1
 cho         |   15     |   1      |   2      |   3      |   4       |  8        |   7       |   3       |     6     |   1       |  5        |  6

At this point, the best that I could come up with is the following, which is admittedly not close:

SELECT testname, count(current_status) AS failure_count
FROM tests
WHERE current_status='FAILED'
AND last_update>'2012-09-01'
AND last_update<='2012-09-30'
GROUP by testname
ORDER BY testname ;

I think I'd need to somehow use COALESCE to get 0 values to show up in the results, plus some crazy JOINs to show multiple months of results, and maybe even a window function?

+1 for showing version, table definition, and expected results. Any chance you can post some sample data (maybe to SQLFiddle.com) so we don't need to make up dummy data? — Craig Ringer, Dec 04 '12 at 00:45
Sorry, I'm not familiar with SQLFiddle. Are you asking to see something like 'select testname,last_update,current_status from tests' ? — netllama, Dec 04 '12 at 04:34
Yes, or preferably `INSERT` statements. Try `pg_dump --data-only --inserts -t TABLENAME DATABASENAME` — Craig Ringer, Dec 04 '12 at 06:29

score 1 · Accepted Answer · edited May 23 '17 at 12:03

`crosstab()` function with two parameters.

Should work like this, to get values for 2012:

SELECT * FROM crosstab(
     $$SELECT testname, to_char(last_update, 'mon_YYYY'), count(*)::int AS ct
        FROM   tests
        WHERE  current_status = 'FAILED'
        AND    last_update >= '2012-01-01 0:0'
        AND    last_update <  '2013-01-01 0:0'  -- proper date range!
        GROUP  BY 1,2
        ORDER  BY 1,2$$

    ,$$VALUES
      ('jan_2012'::text), ('feb_2012'), ('mar_2012')
    , ('apr_2012'), ('may_2012'), ('jun_2012')
    , ('jul_2012'), ('aug_2012'), ('sep_2012')
    , ('oct_2012'), ('nov_2012'), ('dec_2012')$$)
AS ct (testname  text
   , jan_2012 int, feb_2012 int, mar_2012 int
   , apr_2012 int, may_2012 int, jun_2012 int
   , jul_2012 int, aug_2012 int, sep_2012 int
   , oct_2012 int, nov_2012 int, dec_2012 int);

Find detailed explanation under this related question.

~~I didn't test.~~ As @Craig commented, sample values would have helped.
Tested now with my own test case.

Don't display NULL values

The main problem (that months without rows wouldn't show up at all) is averted by the crosstab() function with two parameters.

You cannot use COALESCE in the inner query, because the NULL values are inserted by crosstab() itself. You could ...

1. Wrap the whole thing into a subquery:

SELECT testname
      ,COALESCE(jan_2012, 0) AS jan_2012
      ,COALESCE(feb_2012, 0) AS feb_2012
      ,COALESCE(mar_2012, 0) AS mar_2012
      , ...
FROM (
    -- query from above)
    ) x;

2. `LEFT JOIN` the primary query to the full list of months.

In this case, you don't need the second parameter by definition.
For a bigger range you could use generate_series() to create the values.

SELECT * FROM crosstab(
     $$SELECT t.testname, m.mon, count(x.testname)::int AS ct
       FROM  (
          VALUES
           ('jan_2012'::text), ('feb_2012'), ('mar_2012')
          ,('apr_2012'), ('may_2012'), ('jun_2012')
          ,('jul_2012'), ('aug_2012'), ('sep_2012')
          ,('oct_2012'), ('nov_2012'), ('dec_2012')
       ) m(mon)
       CROSS JOIN (SELECT DISTINCT testname FROM tests) t
       LEFT JOIN (
          SELECT testname
                ,to_char(last_update, 'mon_YYYY') AS mon
          FROM   tests
          WHERE  current_status = 'FAILED'
          AND    last_update >= '2012-01-01 0:0'
          AND    last_update <  '2013-01-01 0:0'  -- proper date range!
          ) x USING (mon)
       GROUP  BY 1,2
       ORDER  BY 1,2$$
     )
AS ct (testname  text
   , jan_2012 int, feb_2012 int, mar_2012 int
   , apr_2012 int, may_2012 int, jun_2012 int
   , jul_2012 int, aug_2012 int, sep_2012 int
   , oct_2012 int, nov_2012 int, dec_2012 int);

Test case with sample data

Here is a test case with some sample data that the OP failed to provide. I used this to test it and make it work.

CREATE TEMP TABLE tests (
  id             bigserial PRIMARY KEY
 ,testname       text NOT NULL
 ,last_update    timestamp without time zone NOT NULL DEFAULT now()
 ,current_status text NOT NULL
 );

INSERT INTO tests (testname, last_update, current_status)
VALUES
  ('foo', '2012-12-05 21:01', 'FAILED')
 ,('foo', '2012-12-05 21:01', 'FAILED')
 ,('foo', '2012-11-05 21:01', 'FAILED')
 ,('bar', '2012-02-05 21:01', 'FAILED')
 ,('bar', '2012-02-05 21:01', 'FAILED')
 ,('bar', '2012-03-05 21:01', 'FAILED')
 ,('bar', '2012-04-05 21:01', 'FAILED')
 ,('bar', '2012-05-05 21:01', 'FAILED');

Awesome thanks! This gets me 99% of the way there. Only thing missing is that null counts aren't reported as 0. — netllama, Dec 04 '12 at 17:48
I thought I could force the null counts to be zeroes with coalesce, but its not having any impact: SELECT testname, to_char(last_update, 'mon_YYYY') AS last_update, coalesce(count(*),0) AS ct — netllama, Dec 04 '12 at 18:07
Tried to force the NULLs to 0 with CASE, but that too doesn't work. I must be missing something silly: CASE WHEN count(*) IS NULL THEN 0 ELSE count(*) END as ct — netllama, Dec 04 '12 at 18:49
@netllama: I provided added solutions for the additional problem. — Erwin Brandstetter, Dec 04 '12 at 21:10
Thanks, I'm giving this a try, but the first solution doesn't seem to work: ERROR: COALESCE types text and integer cannot be matched — netllama, Dec 05 '12 at 18:21
The second with a JOIN also isn't working: ERROR: column "tests.testname" must appear in the GROUP BY clause or be used in an aggregate function — netllama, Dec 05 '12 at 18:21
@netllama: The first error was a type incompatibility. I changed the resulting type to integer to fix it. Solution 2. needed an improvement: `CROSS JOIN` the calender rows to a table of testnames to cover all rows. We did ask for sample data so we can test, and I mentioned my solution was untested for this reason. — Erwin Brandstetter, Dec 05 '12 at 20:27
Thanks. The first solution no longer generates an error, but also still doesn't show zeroes in place of the null values. The second however works perfectly! — netllama, Dec 06 '12 at 00:28
@netllama: The first version works for me, too. (Only fixed the column alias.) I don't see how it could show any `NULL` values. — Erwin Brandstetter, Dec 06 '12 at 10:29

Querying row counts segregated by date ranges

1 Answers1

`crosstab()` function with two parameters.

Don't display NULL values

1. Wrap the whole thing into a subquery:

2. `LEFT JOIN` the primary query to the full list of months.

Test case with sample data

Linked

Querying row counts segregated by date ranges

1 Answers1

crosstab() function with two parameters.

Don't display NULL values

1. Wrap the whole thing into a subquery:

2. LEFT JOIN the primary query to the full list of months.

Test case with sample data

Linked

`crosstab()` function with two parameters.

2. `LEFT JOIN` the primary query to the full list of months.