Postgres data aggregation with variable columns

Question

I have a table of data containing time log information.

create table "time_records" (
    "id" serial NOT NULL PRIMARY KEY,
    "start" timestamp not null,
    "end" timestamp not null,
    "duration" double precision not null,
    "project" varchar(255) not null,
    "case" integer not null,
    "title" text not null,
    "user" varchar(255) not null
);

Here's a few lines of data:

"id","start","end","duration","project","case","title","user"
"1","2014-02-01 11:54:00","2014-02-01 12:20:00","26.18","Project A","933","Something done here","John Smith"
"2","2014-02-02 12:34:00","2014-02-02 15:00:00","146","Project B","990","Something else done","Joshua Kehn"
"3","2014-02-02 17:57:00","2014-02-02 18:39:00","41.38","Project A","933","Another thing done","Bob Frank"
"4","2014-02-03 09:30:00","2014-02-03 11:41:00","131","Project A","983","iOS work","Joshua Kehn"
"5","2014-02-03 10:22:00","2014-02-03 13:29:00","187.7","Project C","966","Created views for things","Alice Swiss"

I can pull bits and pieces of information out of this. For example, a list of every project with time logged between two dates or every person that worked between two dates.

What I would like is to be able to generate a report with the date and then each project across the top with the total amount of time logged for that project.

SELECT
    start::date,
    sum(duration / 60) as "time logged",
    project
FROM
    time_records
WHERE
    project = 'Project A'
GROUP BY
    start::date, project
ORDER BY
    start::date, project;

However I want multiple columns on the output, so combining a select distinct project with this in some way.

Final output would be something like:

date, project a total, project b total, project c total,
2014-02-01,0.5, 0.3, 10,
2014-02-02,1.3, 20, 3,
2014-02-03,20, 10, 10
...

I can get the total per-date per-project with something like:

SELECT
    start::date,
    sum(duration / 60) as "time logged",
    project
FROM
    time_records
GROUP BY
    start::date, project
ORDER BY
    start::date, project;

But then I have multiple dates in rows per-project. I need it to be one date with per-project totals on separate lines.

Does this make sense / is possible with just SQL not writing some code after the query?

Check out the 'crosstab' function from the 'tablefunc' module. — Craig Ringer, Mar 02 '14 at 22:43

Joachim Isaksson · Answer 1 · 2014-03-03T05:49:35.487

A simple way would be to do a "manual" pivot using CASE;

SELECT DATE("start"),
 SUM(CASE WHEN "project"='Project A' THEN duration/60 ELSE 0 END) "Project A",
 SUM(CASE WHEN "project"='Project B' THEN duration/60 ELSE 0 END) "Project B",
 SUM(CASE WHEN "project"='Project C' THEN duration/60 ELSE 0 END) "Project C"
FROM time_records
GROUP BY DATE("start");

An SQLfiddle to test with.

You should be able to do something similar using CROSSTAB(), but I don't have access to PostgreSQL to load the module and test a query using that :-/

score 1 · Answer 2 · edited May 23 '17 at 10:26

For a "pivot" table or cross tabulation use the crosstab() function of the additional module tablefunc.

Table definition

Given this sanitized table definition without reserved SQL key words as identifiers (that's a big no-no, even if you can force it with double-quoting):

CREATE TEMP TABLE time_records (
    id serial PRIMARY KEY,
    t_start timestamp not null,
    t_end timestamp not null,
    duration double precision not null,
    project text not null,
    t_case integer not null,
    title text not null,
    t_user text not null
);

Query

Note how I use the variant with two parameters to deal with missing items in the result properly.

SELECT *
FROM  crosstab (
   $$
   SELECT t_start::date
         , project
         , round(sum(duration / 60)::numeric, 2) AS time_logged
   FROM    time_records
   GROUP   BY 1,2
   ORDER   BY 1,2
   $$
  ,$$VALUES ('Project A'), ('Project B'),('Project C')$$
  ) AS t (
      t_start   date
    , project_a text
    , project_b text
    , project_c text
  );

Result:

t_start    | project_a | project_b | project_c
-----------|-----------|-----------|----------
2014-02-01 | 0.44      |           |
2014-02-02 | 0.69      | 2.43      |
2014-02-03 | 2.18      |           | 3.13

Tested with Postgres 9.3.

Explanation, details and links in this related answer:
PostgreSQL Crosstab Query

This looks perfect, thanks for the refresher on the reserved words. Given that, is it possible to have `project_a` etc generated from the table itself, possibly from a subquery? — Josh K, Mar 03 '14 at 22:06
@JoshK: You would need two steps: 1. build a statement like the above based on data in your table. 2. execute that statement. SQL demands to know the return type, so this is not simply possible in a single step. I wrote a lot more [under this related question](http://stackoverflow.com/questions/15506199/dynamic-alternative-to-pivot-with-case-and-group-by/15514334). — Erwin Brandstetter, Mar 03 '14 at 22:16

Postgres data aggregation with variable columns

2 Answers2

Table definition

Query