Aggregate a single column in query with many columns

Question

Is there a proper way to aggregate a single column when I have many other columns in the query?

I've tried this answer which works, but my query has become a lot more verbose.

My current query looks like this:

SELECT t1.foo1, t1.foo2, t2.foo3, t2.foo4, string_agg(t3.aggregated_field, ', ')
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON t1.id = t2.fkeyid
LEFT JOIN tbl3 t3 ON t2.id = t3.fkeyid
GROUP BY t1.foo1, t1.foo2, t2.foo3, t2.foo4, t2.foo5, t2.foo6
ORDER BY t2.foo5, t2.foo6

The query has many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate, represented by t3.aggregated_field in the pseudo-query above.

As I'm using an aggregate function, all fields listed in the SELECT and ORDER BY must be either aggregated or part of the GROUP BY clause. This makes my query way more verbose than it already is.

That is, assuming foo1 is a primary key, when this field is repeated, all others except aggregated_field are also equal. I want these repeated rows as a single row result with the aggregated field value. (basically a select distinct with an aggregated column)

Is there a better way to do this (without having to put all other fields in the GROUP BY) or should I just iterate over the result set in my back-end executing a query for each row fetching this 1 to n relationship?

The server is running PostgreSQL 9.1.9, more specifically:

PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54), 64-bit

Why would a developer *not* disclose the version of the software he is asking about. Why? It's a painful mass phenomenon here on SO. It's like otherwise highly intelligent people turn into dummies immediately once they ask a question. Supply your software version with your question. **This should go without saying.** — Erwin Brandstetter, Apr 15 '13 at 19:35
@ErwinBrandstetter My bad, the version is 9.0+, I'll connect to server's network so I can check the exact version before I add to the question. — Fabrício Matté, Apr 15 '13 at 19:40
Thanks. My comment is the result of frustration that has been building. It should be so obvious. And yet, so many people don't think of it. Even high-reputation folks. BTW, 9.1 - you go lucky there. My answer should work for you. — Erwin Brandstetter, Apr 15 '13 at 19:52
@ErwinBrandstetter Yes, I understand. Though the use of `string_agg` implicitly makes it a pgsql9+, I should know better than minor versions have significant changes and I admit it was laziness on my part to not check the version beforehand. Skimming through answer makes perfect sense, I'll provide feedback in an hour or so when I have the time to apply and test it. `=]` — Fabrício Matté, Apr 15 '13 at 19:55
BTW, major versions include first digit after the dot in PostgreSQL. [More on the official versioning site of the project.](http://www.postgresql.org/support/versioning/) — Erwin Brandstetter, Apr 15 '13 at 20:06

score 6 · Accepted Answer · edited May 23 '17 at 12:14

Simple query

This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer:

PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects

It is enough to GROUP BY the primary key of a table. Since:

foo1 is a primary key

.. you can simplify your example to:

SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM   tbl1
GROUP  BY 1
ORDER  BY foo7, foo8;  -- have to be spelled out, since not in select list!

Query with multiple tables

However, since you have:

many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate

.. it should be faster and simpler to aggregate first, join later:

SELECT t1.foo1, t1.foo2, ...
     , t2.bar1, t2.bar2, ...
     , a.aggregated_col 
FROM   tbl1 t1
LEFT   JOIN tbl2 t2 ON ...
...
LEFT   JOIN (
   SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col
   FROM   agg_tbl a ON ...
   GROUP  BY some_id
   ) a ON a.some_id = ?.some_id
ORDER  BY ...

This way the big portion of your query does not need aggregation at all.

I recently provided a test case in an SQL Fiddle to prove the point in this related answer:

PostgreSQL - order by an array

Since you are referring to this related answer: No, DISTINCT is not going to help at all in this case.

Yes, I noticed the `DISTINCT` wouldn't help in this case a couple hours ago. I'll check your answer throughout when I get home. `=]` — Fabrício Matté, Apr 15 '13 at 19:42
From my understanding, the sub-query will implicitly create a temporary table with the aggregations of the whole table before the join. In that case I can optimize it a bit if I put a `WHERE` clause in the sub-query, correct? This looks like the best approach, I'll adapt it to my needs. Thanks. — Fabrício Matté, Apr 15 '13 at 20:30
@FabrícioMatté: A WHERE clause may be useful, especially if you have an index that corresponds to it. However, depending on the whole query, the Postgres query planner may use varying plans, whichever it expects to be fastest (that's where properly configured [planner cost constants](http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS) come in). It's not necessarily a "temp table" (materialize step). Test with `EXPLAIN ANALYZE` to get the details. — Erwin Brandstetter, Apr 15 '13 at 20:36

Clodoaldo Neto · Answer 2 · 2013-04-15T19:46:41.867

1

If the main problem is that the fields (foox) are computed then this can help:

SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM tbl1
GROUP BY 1, 2, 3, 4, 5, 6
ORDER BY 5, 6

The 1, 2... are the fields in the order they appear in the select list.

edited Apr 15 '13 at 19:46

answered Apr 15 '13 at 15:08

Clodoaldo Neto

118,695
26
233
260

Though not ideal, I guess this will reduce the verbosity as I asked. Will do some more research this noon. – Fabrício Matté Apr 15 '13 at 15:11
Works great, just a note: `foo7` and `foo8` can't be enumerated in that manner unless they were also in the `SELECT` list. – Fabrício Matté Apr 15 '13 at 19:37
@FabrícioMatté Yes I just copied and guessed they were in the real select list. – Clodoaldo Neto Apr 15 '13 at 19:40
I'll keep the +1 as it was useful to me, but when you have 30 fields in the select and more in the `ORDER BY` this is not very maintainable. `=]` – Fabrício Matté Apr 15 '13 at 20:25

Aggregate a single column in query with many columns

2 Answers2

Simple query

Query with multiple tables

Linked