2

I'm porting a simple expense database to Postgres and got stuck on a view using GROUP BY and multiple JOIN clauses. I think Postgres wants me to use all the tables in the GROUP BY clause.

Table definition is at the end. Note that columns account_id, receiving_account_id and place may be NULL and an operation can have 0 tags.

Original CREATE statement

CREATE VIEW details AS SELECT
    op.id,
    op.name,
    c.name,
    CASE --amountsign
        WHEN op.receiving_account_id IS NOT NULL THEN
            CASE
                WHEN op.account_id IS NULL THEN '+'
                ELSE '='
            END
        ELSE '-' 
    END || ' ' || printf("%.2f", op.amount) || ' zł' AS amount,
    CASE --account
        WHEN op.receiving_account_id IS NOT NULL THEN
            CASE
                WHEN op.account_id IS NULL THEN ac2.name
                ELSE ac.name || ' -> ' || ac2.name
            END
        ELSE ac.name
    END AS account,
    t.name AS type,
    CASE --date
        WHEN op.time IS NOT NULL THEN op.date || ' ' || op.time
        ELSE op.date
    END AS date,
    p.name AS place,
    GROUP_CONCAT(tag.name, ', ') AS tags
FROM operation op
LEFT JOIN category c ON op.category_id = c.id
LEFT JOIN type t ON op.type_id = t.id
LEFT JOIN account ac ON op.account_id = ac.id
LEFT JOIN account ac2 ON op.receiving_account_id = ac2.id
LEFT JOIN place p ON op.place_id = p.id
LEFT JOIN operation_tag ot ON op.id = ot.operation_id
LEFT JOIN tag ON ot.tag_id = tag.id
GROUP BY IFNULL (ot.operation_id, op.id)
ORDER BY date DESC

Current query in Postgres

I made some updates and my current statement is:

BEGIN TRANSACTION;
CREATE VIEW details AS SELECT
    op.id,
    op.name,
    c.name,
    CASE --amountsign
        WHEN op.receiving_account_id IS NOT NULL THEN
            CASE
                WHEN op.account_id IS NULL THEN '+'
                ELSE '='
            END
        ELSE '-' 
    END || ' ' || op.amount || ' zł' AS amount,
    CASE --account
        WHEN op.receiving_account_id IS NOT NULL THEN
            CASE
                WHEN op.account_id IS NULL THEN ac2.name
                ELSE ac.name || ' -> ' || ac2.name
            END
        ELSE ac.name
    END AS account,
    t.name AS type,
    CASE --date
        WHEN op.time IS NOT NULL THEN to_char(op.date, 'DD.MM.YY') || ' ' || op.time
        ELSE to_char(op.date, 'DD.MM.YY')
    END AS date,
    p.name AS place,
    STRING_AGG(tag.name, ', ') AS tags
FROM operation op
LEFT JOIN category c ON op.category_id = c.id
LEFT JOIN type t ON op.type_id = t.id
LEFT JOIN account ac ON op.account_id = ac.id
LEFT JOIN account ac2 ON op.receiving_account_id = ac2.id
LEFT JOIN place p ON op.place_id = p.id
LEFT JOIN operation_tag ot ON op.id = ot.operation_id
LEFT JOIN tag ON ot.tag_id = tag.id
GROUP BY COALESCE (ot.operation_id, op.id)
ORDER BY date DESC;
COMMIT;

Here I get Column 'x' must appear in GROUP BY clause errors as I add listed ones:

GROUP BY COALESCE(ot.operation_id, op.id), op.id, c.name, ac2.name, ac.name, t.name, p.name

When I add p.name column I get Column 'p.name' is defined more than once error. How do I fix that?

Table definition

CREATE TABLE operation (
  id integer NOT NULL PRIMARY KEY,
  name character varying(64) NOT NULL,
  category_id integer NOT NULL,
  type_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  date date NOT NULL,
  "time" time without time zone NOT NULL,
  place_id integer,
  account_id integer,
  receiving_account_id integer,
  CONSTRAINT categories_transactions FOREIGN KEY (category_id)
      REFERENCES category (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT transactions_accounts FOREIGN KEY (account_id)
      REFERENCES account (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT transactions_accounts_second FOREIGN KEY (receiving_account_id)
      REFERENCES account (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT transactions_places FOREIGN KEY (place_id)
      REFERENCES place (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT transactions_transaction_types FOREIGN KEY (type_id)
      REFERENCES type (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Krzysztof Antoniak
  • 451
  • 1
  • 6
  • 20

2 Answers2

3

Like @Andomar already provided: Most RDBMS require to group by every column that appears unaggregated - anywhere else in the query (including the SELECT list, but also in the WHERE clause etc.)

The SQL standard also defines that expressions in the GROUP BY clause shall also cover functionally dependent expressions. Postgres implemented that the PK column covers all columns of the same table.

So op.id covers the whole table and this should work for your current query:

GROUP BY op.id, c.name, 5, t.name, p.name

5 being a positional reference to the SELECT list, which is also allowed in Postgres. It's just notational shorthand for repeating the long expression:

CASE
   WHEN op.receiving_account_id IS NOT NULL THEN
      CASE
         WHEN op.account_id IS NULL THEN ac2.name
         ELSE ac.name || ' -> ' || ac2.name
      END
   ELSE ac.name
END

I derive from your names that you have a n:m relationship between operation and tag, implemented with operation_tag. All other joins don't seem to multiply rows, so it would be more efficient to aggregate tags separately - like @Andomar hinted, just get the logic right.

This should work:

SELECT op.id
     , op.name
     , c.name
     , CASE  -- amountsign
          WHEN op.receiving_account_id IS NOT NULL THEN
             CASE WHEN op.account_id IS NULL THEN '+' ELSE '=' END
          ELSE '-' 
       END || ' ' || op.amount || ' zł' AS amount
     , CASE  -- account
          WHEN op.receiving_account_id IS NOT NULL THEN
             CASE
                WHEN op.account_id IS NULL THEN ac2.name
                ELSE ac.name || ' -> ' || ac2.name
             END
          ELSE ac.name
       END AS account
     , t.name AS type
     , to_char(op.date, 'DD.MM.YY') || ' ' || op.time AS date  -- see below
     , p.name AS place
     , ot.tags
FROM   operation op
LEFT   JOIN category c   ON op.category_id = c.id
LEFT   JOIN type     t   ON op.type_id = t.id
LEFT   JOIN account  ac  ON op.account_id = ac.id
LEFT   JOIN account  ac2 ON op.receiving_account_id = ac2.id
LEFT   JOIN place    p   ON op.place_id = p.id
LEFT   JOIN (
   SELECT operation_id, string_agg(t.name, ', ') AS tags
   FROM   operation_tag ot
   LEFT   JOIN tag      t  ON t.id = ot.tag_id
   GROUP  BY 1
   ) ot ON op.id = ot.operation_id
ORDER  BY op.date DESC, op.time DESC;

Asides

You can replace:

CASE --date
   WHEN op.time IS NOT NULL THEN to_char(op.date, 'DD.MM.YY') || ' ' || op.time
   ELSE to_char(op.date, 'DD.MM.YY')
END AS date

with this shorter equivalent:

concat_ws(' ', to_char(op.date, 'DD.MM.YY'), op.time) AS date

But since both columns are defined NOT NULL, you can furher simplify to:

to_char(op.date, 'DD.MM.YY') || ' ' || op.time AS date

Careful with your ORDER BY you have at least one input column also named date. If you use the unqualified name, it will refer to the output column - which is what you want (as clarified in the comment). Details:

However, sorting by the text representation would not sort according to your timeline correctly. Sort by original values instead as suggested in my query above.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for elaborate answer. I cannot see what the error was, though. Regarding sorting, I used output column since I want db to take time in account while sorting. – Krzysztof Antoniak Jan 02 '16 at 19:44
  • @Kikert: Consider my updates concerning `ORDER BY` then. – Erwin Brandstetter Jan 02 '16 at 19:59
  • Nice, I didn't know about Postgres' PK grouping. But positional references? They invite errors and are hard to read. I am surprised you would use or recommend them. – Andomar Jan 02 '16 at 23:56
  • @Andomar: I am demonstrating all options. Whether to use shorthand notation or spell it out, is the readers choice. One should *know* about it in any case. I wouldn't go as far as to recommending it. More error-prone in some contexts, I have to agree there. Convenient to replace long expressions, though. And harder to read? That's up for debate. Repeated long expressions aren't exactly easy on the eye, either. – Erwin Brandstetter Jan 03 '16 at 01:15
  • You can refer to long expressions by alias in the `group by`, for example `select col1 as x from yourtable group by x;` Imho using SQL like `group by 7, 12 4` is indefensibly prioritizing a few keystrokes over future maintainer's sanity :) – Andomar Jan 03 '16 at 15:02
2

Most databases require you to group by every column that appears unaggregated in the select. Unaggregated means not wrapped in an aggregate like min, max or string_agg. So you'd need to group on: op.id, op.name, c.name, op.receiving_account_id, ..., etc.

The reason for this requirement is that the database has to determine a value for the group. By adding the column to the group by clause, you confirm that every row in the group has the same value. For other groups, you must specify which value to use with an aggregate. The exception is MySQL, which just picks a arbitrary value if you don't make a conscious choice.

If your group by is just to create a list of tags, you could move that to a subquery:

left join
        (
        select  id
        ,       string_agg(tag.name, ', ') tags
        from    tag
        group by
                id
        ) t
on      ot.tag_id = t.id

And you can avoid a very long group by for the outer query.

Andomar
  • 232,371
  • 49
  • 380
  • 404