Sum by month and put months as columns

Question

Background

I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.

Example

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |

Desired Result

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |

Current Approach

I can easily group by month, summing the values. Which gets me to:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |

But I now need those months as column names. Not sure where to go from here.

Additional Information

In terms of language, this query is to be run in postgres.
The months above are just examples, obviously the real data set is much larger and covers all 12 months across thousands of IDs

Any ideas from an SQL guru very much appreciated!

score 9 · Answer 1 · answered May 16 '13 at 21:56

9

You can use an aggregate function with a CASE expression to turn the rows into columns:

select id,
  extra_info,
  sum(case when month = 'jan' then value else 0 end) jan,
  sum(case when month = 'feb' then value else 0 end) feb,
  sum(case when month = 'mar' then value else 0 end) mar,
  sum(case when month = 'apr' then value else 0 end) apr,
  sum(case when month = 'may' then value else 0 end) may,
  sum(case when month = 'jun' then value else 0 end) jun,
  sum(case when month = 'jul' then value else 0 end) jul,
  sum(case when month = 'aug' then value else 0 end) aug,
  sum(case when month = 'sep' then value else 0 end) sep,
  sum(case when month = 'oct' then value else 0 end) oct,
  sum(case when month = 'nov' then value else 0 end) nov,
  sum(case when month = 'dec' then value else 0 end) "dec"
from yt
group by id, extra_info

See SQL Fiddle with Demo

answered May 16 '13 at 21:56

Taryn

242,637
56
362
405

1

Thanks! I vaguely remember seeing something like this before, wish my brain was more reliable! Is there some way of doing it without manually handling the cases? Works great for months but I'm wondering if there's a more generic form which could be applied to categories with 100s of entries? – Pete Hamilton May 16 '13 at 21:59
@PeterHamilton Yes, you can use the `crosstab` function. Here is a great answer from another user -- http://stackoverflow.com/questions/15506199/dynamic-alternative-to-pivot-with-case-and-group-by/15514334#15514334 – Taryn May 16 '13 at 22:04
1

Exactly what i needed! Had a version that was using sub querys for each month and would take hours to build verse seconds using this. – Ominus May 01 '17 at 17:03

Erwin Brandstetter · Accepted Answer · 2022-08-02T02:29:18.687

Setup

CREATE TABLE tbl (
  id int
, extra_info varchar(3)
, month date
, value int
);
   
INSERT INTO tbl VALUES
  (1, 'abc', '2012-01-01', 10)
, (1, 'abc', '2012-02-01', 20)
, (2, 'def', '2012-01-01', 10)
, (2, 'def', '2012-02-01',  5)
, (1, 'abc', '2012-01-01', 15)
, (3, 'ghi', '2012-03-01', 15)
;

`crosstab()`

I would use crosstab() from the additional tablefunc module. Install once per database with:

CREATE EXTENSION tablefunc;

Basics:

PostgreSQL Crosstab Query

How to deal with "extra" columns:

Pivot on Multiple Columns using Tablefunc

Advanced usage:

Dynamic alternative to pivot with CASE and GROUP BY

Query

SELECT * FROM crosstab(
   $$
   SELECT id, min(extra_info), month, sum(value) AS value
   FROM   tbl
   GROUP  BY id, month
   ORDER  BY id, month
   $$
 , $$
   VALUES
     ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
   , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
   $$
   ) AS ct (id  int, extra text
          , jan int, feb int, mar int, apr int, may int, jun int
          , jul int, aug int, sep int, oct int, nov int, dec int);

Obviously, you can only output one extra_info per id. I pick min(extra_info) since you didn't specify. If all are the same per id, you could also group by it additionally.

Result:

 id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
  1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
  2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
  3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |

db<>fiddle here

Installing the tablefunc module (once per database) incurs some overhead, but queries are typically faster and shorter.

Pure SQL

If you can't or won't install the additional module, plain SQL got s bit faster with the aggregate FILTER clause added with Postgres 9.4. See:

Aggregate columns with additional (distinct) filters

SELECT id, min(extra_info) AS extra
     , sum(value) FILTER (WHERE month = 'jan') AS jan
     , sum(value) FILTER (WHERE month = 'feb') AS feb
     , sum(value) FILTER (WHERE month = 'mar') AS mar
     , sum(value) FILTER (WHERE month = 'apr') AS apr
     , sum(value) FILTER (WHERE month = 'may') AS may
     , sum(value) FILTER (WHERE month = 'jun') AS jun
     , sum(value) FILTER (WHERE month = 'jul') AS jul
     , sum(value) FILTER (WHERE month = 'aug') AS aug
     , sum(value) FILTER (WHERE month = 'sep') AS sep
     , sum(value) FILTER (WHERE month = 'oct') AS oct
     , sum(value) FILTER (WHERE month = 'nov') AS nov
     , sum(value) FILTER (WHERE month = 'dec') AS dec
FROM   tbl
GROUP  BY id
ORDER  BY id;

`0` instead of `NULL`

To output 0 instead of NULL for missing values, use COALESCE for either query:

SELECT id, extra
     , COALESCE(jan, 0) AS jan
     , COALESCE(feb, 0) AS feb
     , COALESCE(mar, 0) AS mar
     , COALESCE(apr, 0) AS apr
     , COALESCE(may, 0) AS may
     , COALESCE(jun, 0) AS jun
     , COALESCE(jul, 0) AS jul
     , COALESCE(aug, 0) AS aug
     , COALESCE(sep, 0) AS sep
     , COALESCE(oct, 0) AS oct
     , COALESCE(nov, 0) AS nov
     , COALESCE(dec, 0) AS dec
FROM  (<query from above>)