SELECT from subquery without having to specify all columns in GROUP BY

Question

Idea is to query an article table where an article has a given tag, and then to STRING_AGG all (even unrelated) tags that belong to that article row.

Example tables and query:

CREATE TABLE article (id SERIAL, body TEXT);
CREATE TABLE article_tag (article INT, tag INT);
CREATE TABLE tag (id SERIAL, title TEXT);

SELECT DISTICT ON (id)
    q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
    SELECT a.*, tag.title tag_title
    FROM article a
        LEFT JOIN article_tag x ON a.id = tag.article
        LEFT JOIN tag ON tag.id = x.tag
    WHERE tag.title = 'someTag'
) q
GROUP BY q.id

Running the above, postgres require that the q.body must be included in GROUP BY:

ERROR:  column "q.body" must appear in the GROUP BY clause or be used in an aggregate function

As I understand it, it's because subquery q doesn't include any PRIMARY key.

I naively thought that the DISTINCT ON would supplement that, but it doesn't seem so.

Is there a way to mark a column in a subquery as PRIMARY so that we don't have to list all columns in GROUP BY clause?

If we do have to list all columns in GROUP BY clause, does that incur significant perf cost?

EDIT: to elaborate, since PostgreSQL 9.1 you don't have to supply non-primary (i.e. functionally dependent) keys when using GROUP BY, e.g. following query works fine:

SELECT a.id, a.body, STRING_AGG(tag.title, '|') tags
FROM article a
    LEFT JOIN article_tag x ON a.id = tag.article
    LEFT JOIN tag ON tag.id = x.tag
GROUP BY a.id

I was wondering if I can leverage the same behavior, but with a subquery (by somehow indicating that q.id is a PRIMARY key).

Possible duplicate of [GROUP BY / aggregate function confusion in SQL](https://stackoverflow.com/questions/4611897/group-by-aggregate-function-confusion-in-sql) — waka, Oct 04 '17 at 12:42
You *do* have to group on all the non-aggregate columns, due to using an aggregate function. — 404, Oct 04 '17 at 12:56
edited to elaborate on what I mean by not having to specify non-primary keys in GROUP BY (thus why it's not a duplicate of the above question) — dwelle, Oct 04 '17 at 13:14

score 1 · Accepted Answer · answered Oct 04 '17 at 13:33

It sadly doesn't work when you wrap your primary key in subquery and I don't know of any way to "mark it" as you suggested.

You can try this workaround using window function and distinct:

CREATE TABLE test1 (id serial primary key, name text, value text);
CREATE TABLE test2 (id serial primary key, test1_id int, value text);

INSERT INTO test1(name, value)
           values('name1', 'test01'), ('name2', 'test02'), ('name3', 'test03');

INSERT INTO test2(test1_id, value)
           values(1, 'test1'), (1, 'test2'), (3, 'test3');

SELECT DISTINCT ON (id) id, name, string_agg(value2, '|') over (partition by id)
  FROM (SELECT test1.*, test2.value AS value2
          FROM test1
          LEFT JOIN test2 ON test2.test1_id = test1.id) AS sub;

id  name    string_agg
1   name1   test1|test2
2   name2   null
3   name3   test3

Demo

score 0 · Answer 2 · answered Oct 04 '17 at 13:04

Problem is in outer SELECT - you should either aggregate columns either group by them. Postgres wants you to specify what to do with q.body - group by it or calculate aggregate. Looks little bit awkward but should work.

SELECT DISTICT ON (id)
    q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
    SELECT a.*, tag.title tag_title
    FROM article a
        LEFT JOIN article_tag x ON a.id = tag.article
        LEFT JOIN tag ON tag.id = x.tag
    WHERE tag.title = 'someTag'
) q
GROUP BY q.id, q.body
--             ^^^^^^

Another way is to make a query to get id and aggregated tags then join body to it. If you wish I can make an example.

No need, I understand that. My question was whether I can work around it the same way as I could if I selected directly from `article` table, and not from a subquery. — dwelle, Oct 04 '17 at 13:07

SELECT from subquery without having to specify all columns in GROUP BY

2 Answers2