How to group multiple subqueries in the same output row

Question

I'm trying to extract transaction details from two already existing tables:

transactions, containing the total amount received,
bills, with a row for each bill received in the transaction, and containing the denomination of the bill.

Both are indexed with a common session id. [Correction: only the transactions table is indexed on session id.]

I've joined the tables and made subqueries to count the number of each bill denomination per transaction (how many 10s, 20s, etc.). I want to get one record for each transaction with all the counts on the same row.

I made it as far as this query:

SELECT
t.session,
to_char(t.amount::numeric, '"$"9990D99') AS "USD",
(select count(b.denom) where b.denom = '50' ) AS "50",
(select count(b.denom) where b.denom = '20') AS "20",
(select count(b.denom) where b.denom = '10') AS "10",
(select count(b.denom) where b.denom = '5') AS "5",
(select count(b.denom) where b.denom = '1') AS "1"

FROM transactions AS t JOIN bills AS b USING (session)
GROUP BY
t.session, t.amount, b.denom
ORDER BY 
t.session,
b.denom ASC;

... which correctly gives me the bill counts, but with one row for each denomination:

   session    |    USD    | 50 | 20 | 10 | 5 | 1
--------------+-----------+----+----+----+---+----
 c64af32f1815 | $  135.00 |    |    |    | 1 |
 c64af32f1815 | $  135.00 |    |    |  1 |   |
 c64af32f1815 | $  135.00 |    |  6 |    |   |
 643e096b6542 | $  175.00 |    |    |    |   | 10
 643e096b6542 | $  175.00 |    |    |    | 1 |
 643e096b6542 | $  175.00 |    |  8 |    |   |
 ce7d2c647eff | $  200.00 |  4 |    |    |   |

What I want is this, with one row per transaction:

   session    |    USD    | 50 | 20 | 10 | 5 | 1
--------------+-----------+----+----+----+---+----
 c64af32f1815 | $  135.00 |    |  6 |  1 | 1 |
 643e096b6542 | $  175.00 |    |  8 |    | 1 | 10
 ce7d2c647eff | $  200.00 |  4 |    |    |   |

What do I need to understand to fix this query?

Revised Query (following @erwin suggestion to avoid subqueries):

SELECT
t.session,
to_char(t.amount::numeric, '"$"9990D99') AS "USD",
COUNT(NULLIF(b.denom = '100', FALSE)) AS "100",
COUNT(NULLIF(b.denom = '50', FALSE)) AS "50",
COUNT(NULLIF(b.denom = '20', FALSE)) AS "20",
COUNT(NULLIF(b.denom = '10', FALSE)) AS "10",
COUNT(NULLIF(b.denom = '5', FALSE)) AS "5",
COUNT(NULLIF(b.denom = '1', FALSE)) AS "1"

FROM transactions AS t JOIN bills AS b USING (session)
GROUP BY
t.session, t.amount, b.denom
ORDER BY 
t.session,
b.denom ASC;

This query still generates one line of output for each aggregate (count) function call.

Thank you all for the suggestions. Two of the answers solved the immediate problem at hand, and seem to be similar solutions. The suggestion of @erwin allowed me to reduce the number of subquery layers from 2 to 1, but I'm still not able to achieve what I want without subqueries. I want to explore this possibility more deeply before choosing one as the answer. And I still want to understand why the results aren't what I expected... Why does each aggregate function call generate a full new line in the output (even when I removed the 2nd-level subquery)? — W L Bentley, Mar 12 '16 at 23:21
The table definition should be in the question - ideally complete `CREATE TABLE` scripts. We need to see data types and constraints. And your version of Postgres. The cast to `numeric` raises questions. And there are inconsistencies in your example: You say `ORDER BY t.session ...`, but the desired result shows a different sort order. And I already stressed that you cannot include `b.denom` in your `GROUP BY`, which is the primary remaining problem. — Erwin Brandstetter, Mar 13 '16 at 04:31
The context here is that I'm trying to extract reports from a database that is not mine to change. The queries and the outputs were slightly edited/simplified to focus on the issues at hand -- inconsistencies arose only from that. The cast to numeric is unnecessary and was a copy/paste error. (Thanks for pointing that out.) — W L Bentley, Mar 13 '16 at 06:38
Ok. All three answers give the results I wanted. (Some of my confusion was self-inflicted in the end.) I'm choosing @erwin's as the best for eliminating the subqueries. Thanks to all for your help. — W L Bentley, Mar 13 '16 at 07:41

Renzo · Answer 1 · 2016-03-12T07:51:05.007

0

I think you are almost there. You need to do another group by after your query. For instance:

WITH q1 as (SELECT
    t.session,
    to_char(t.amount::numeric, '"$"9990D99') AS "USD",
    (select count(b.denom) where b.denom = '50' ) AS "50",
    (select count(b.denom) where b.denom = '20') AS "20",
    (select count(b.denom) where b.denom = '10') AS "10",
    (select count(b.denom) where b.denom = '5') AS "5",
    (select count(b.denom) where b.denom = '1') AS "1"

    FROM transactions AS t JOIN bills AS b USING (session)
    GROUP BY t.session, t.amount, b.denom)

SELECT session, "USD", SUM("50") AS "50", SUM("20") AS "20", SUM("10") AS "10",
       SUM("5") AS "5", SUM("1") AS "1"
FROM q1
GROUP BY session, "USD"

edited Mar 12 '16 at 07:51

answered Mar 11 '16 at 16:34

Renzo

26,848
5
49
61

I tried modifying the query as you suggest, but it does not recognize USD as a valid column name in the second SELECT. I received `ERROR: column "usd" does not exist`. If I instead refer to it with t.amount or trnasactions.amount, i receive `ERROR: missing FROM-clause entry for table "t"`. So, unable to make this work. – W L Bentley Mar 12 '16 at 07:50
Did you tried to modify both occurrencies of USD with "USD"? I edited my answer. – Renzo Mar 12 '16 at 07:52
Yes. With "USD" in double quotes, this produces the results I want. – W L Bentley Mar 12 '16 at 07:55

score 0 · Accepted Answer · edited May 23 '17 at 12:17

Do not use correlated subqueries. That's inefficient.
And do not include b.denom in the GROUP BY clause. That's your primary error.

Postgres 9.4+

In Postgres 9.4 or later use the dedicated aggregate FILTER feature:

SELECT t.session
     , to_char(t.amount::numeric, '"$"9990D99') AS "USD"
     , count(*) FILTER (WHERE b.denom = '50')   AS "50"  -- !
     , count(*) FILTER (WHERE b.denom = '20')   AS "20"  -- !
     , ...
FROM   ...
GROUP  BY t.session, t.amount  -- !
ORDER  BY ...

Explanation and links to more:

How can I simplify this game statistics query?

Postgres 9.3-

For older versions (Postgres 9.3 or older), there is a variety of (less elegant) alternatives:

SELECT t.session
     , to_char(t.amount::numeric, '"$"9990D99') AS "USD"  -- why cast to numeric?
     , count(b.denom = '100' OR NULL) AS "100"  -- bad column name
     , count(b.denom =  '50' OR NULL) AS "50"
     , count(b.denom =  '20' OR NULL) AS "20"
     , count(b.denom =  '10' OR NULL) AS "10"
     , count(b.denom =   '5' OR NULL) AS "5"
     , count(b.denom =   '1' OR NULL) AS "1"
FROM   transactions t
JOIN   bills b USING (session)
GROUP  BY t.session, t.amount
ORDER  BY t.session;

After cleaning up:

SELECT t.session, to_char(t.amount, '"$"9990D99') AS usd
     , d100, d50, d20, d10, d5, d1
FROM   transactions t
LEFT   JOIN (
   SELECT session
        , nullif(count(denom = 100 OR NULL), 0) AS d100
        , nullif(count(denom =  50 OR NULL), 0) AS d50
        , nullif(count(denom =  20 OR NULL), 0) AS d20
        , nullif(count(denom =  10 OR NULL), 0) AS d10
        , nullif(count(denom =   5 OR NULL), 0) AS d5
        , nullif(count(denom =   1 OR NULL), 0) AS d1
   FROM   bills
   GROUP  BY 1
   ) b USING (session)
ORDER  BY session;

What's changed?

Don't use illegal identifiers if you can avoid it, then you also don't need double quotes.
Assuming integer for bills.denom, so we also don't need single quotes around the constants.
Seems like you want NULL in columns where no bills are found. Wrap the result in NULLIF()
Since you retrieve the whole table it's faster to aggregate before you join.

SQL Fiddle.

More techniques and explanation:

For absolute performance, is SUM faster or COUNT?

Faster alternative: `crosstab()`

For top performance, use actual cross tabulation with crosstab(). You need the additional module tablefunc installed. Read this first if you are not familiar with it:

PostgreSQL Crosstab Query

SELECT t.session, to_char(t.amount, '"$"9990D99') AS usd
     , d100, d50, d20, d10, d5, d1
FROM   transactions t
LEFT   JOIN crosstab(
     $$SELECT session, denom, count(*)::int
       FROM   bills
       GROUP  BY 1,2
       ORDER  BY 1$$
   , $$SELECT unnest ('{100,50,20,10,5,1}'::int[])$$
   ) AS ct (session text, d100 int, d50 int, d20 int, d10 int, d5 int, d1 int)
     USING (session);

Currently running version 9.3.11, so the 9.4 / FILTER suggestion will have to wait until we can upgrade to 9.4. FILTER looks like a more elegant solution, and is very appealing after struggling with subqueries. — W L Bentley, Mar 12 '16 at 08:06
With version 9.3: Removing the individual SELECTs and replacing them with `SUM((b.denom = '100')::int) AS "100"` is clearly a step in the right direction. But this still leaves each denomination on its own line. If I remove `b.denom` from the GROUP BY clause, I get `ERROR: column "b.denom" must appear in the GROUP BY clause...`. Isn't one subquery still needed to reduce the rows to one per transaction? Or am I missing something? — W L Bentley, Mar 12 '16 at 08:47
Yes. The "After cleaning up" query works as I expected. Thank you for pushing me to go beyond multiple subqueries. — W L Bentley, Mar 13 '16 at 07:35

score 0 · Answer 3 · answered Mar 11 '16 at 17:02

0

can you just add query globale :

SELECT
session,
"USD",
sum("50") AS "50",
sum("20") AS "20",
sum("10") AS "10",
sum("5") AS "5",
sum("1") AS "1"

from (SELECT
t.session,
to_char(t.amount::numeric, '"$"9990D99') AS "USD",
(select count(b.denom) where b.denom = '50' ) AS "50",
(select count(b.denom) where b.denom = '20') AS "20",
(select count(b.denom) where b.denom = '10') AS "10",
(select count(b.denom) where b.denom = '5') AS "5",
(select count(b.denom) where b.denom = '1') AS "1"

FROM transactions AS t JOIN bills AS b USING (session)
GROUP BY
t.session, t.amount, b.denom
ORDER BY 
t.session,
b.denom ASC)youre_query
GROUP BY session,"USD"

answered Mar 11 '16 at 17:02

Mesbah Gueffaf

518
1
7
21

Yes, this does produce the results I want. But I don't understand what the term 'query globale' or 'global query' refers to. Can you explain the term or link to an explanation? – W L Bentley Mar 12 '16 at 07:58
it means to query the results of your query – Mesbah Gueffaf Mar 12 '16 at 09:08

How to group multiple subqueries in the same output row

3 Answers3

Postgres 9.4+

Postgres 9.3-

Faster alternative: crosstab()

Faster alternative: `crosstab()`