Looking for a more performant query

Question

I have 2 tables with flex one-to-many relationship, meaning the parent side may not have any child:

Parent_table

id   dim1   price
1    "abc"  1.00
2    "abc"  2.00
3    "def"  1.00

child_table:

id       parent_id
1          1
2          1

The parent query is very straight forward:

select dim1, count(*), sum(price) from parent_table group by dim1

Now, I want to also add another measure call "child_count" to the above query. By doing this, I get a very good performant query but naturally, it double counts the parent columns due to the join:

select p.dim1, count(*), sum(p.price), count(distinct c.id) as child_count
from parent_table p
  left outer join child_table c on c.parent_id = p.id
group by dim1

I believe the below query is exactly what I want, but these tables are extremely large and I cannot finish the query to verify:

select p.dim1, sum(*), sum(p.price), sum(child_count) as child_count
from parent_table p
  left outer join (select c.parent_id, count(c.id) child_count from child_table c) child on p.id = child.parent_id
group by dim1

Do you have or do you know if Postgres has any smart way to achieve the query I wanted without doing a nested query to count the child records for each parent?

Edit: I made some fix to my mistake on the existing question to move the focus away from it. The changes are:

add 2 more rows to parent table
delete quantity column  
change count(quantity) --> count(*)

Edit 2: hmm, I don't know why I forgot it exists, but this seems to work a lot better in Postgresql compared to joining once:

select p.dim1, count(*), sum(p.price), 
  sum(coalesce((select count(distinct id) from child_table where parent_id = p.id), 0) as child_count
from parent_table p
group by dim1

What do you think of it?

Without the DDL for all tables and indexes involved and without the results from `explain(analyze, verbose, buffers, settings)` for your SQL statements, it will be next to impossible to improve performance. Please share some information, in plain text, as an update to your question. — Frank Heikens, Aug 14 '23 at 23:35
When child_table.id is the primary key, there is no point in count(distinct id) since every value is unique by design. — Frank Heikens, Aug 14 '23 at 23:43
`count(quantity)` makes no sense. If `quantity` is always `1`, it's not really a "quantity". It's just redundant noise. And if it isn't, `count(quantity)` seems wrong. Please disclose actual table definitions showing data types and constraints: `CREATE TABLE` scripts. Tell us your version of Postgres and what the added measure is supposed to measure *exactly*, rather than what you chose to call it. Follow instructions for performance questions here: https://stackoverflow.com/tags/postgresql-performance/info Most importantly: is `dim1` defined `UNIQUE`? — Erwin Brandstetter, Aug 15 '23 at 00:39
Sorry, it was my mistake when writing up the example. It should be `count(*)` instead of quantity. The actual tables and query is very large involving many tables, while the "child" has 2 tables. There are FK relationship with index for every parent.id and child.parent_id tables. Even then, any query that isn't being filtered first is extremely slow. — user3758745, Aug 15 '23 at 14:54
I added another query in Edit#2 that seems to work for the specific range I'm testing on. What do you think of it? — user3758745, Aug 15 '23 at 16:43

Erwin Brandstetter · Answer 1 · 2023-08-15T10:47:55.587

2

Going out on a limb, you probably want this query:

SELECT p.dim1
     , sum(p.quantity) AS sum_quantity
     , sum(p.price) AS sum_price
     , COALESCE(sum(child_count), 0) AS sum_child_count
FROM   parent_table p
LEFT   JOIN (
   SELECT parent_id AS id, count(*) AS child_count
   FROM   child_table
   GROUP  BY 1
   ) c USING (id)
GROUP  BY dim1;

Filling in experience with this kind of questions for missing information.

Aggregate rows in the child table before joining to the parent table. See:

Query with LEFT JOIN not returning rows for count of 0

Throwing in COALESCE to convert possible null values.

Also converted to count(quantity) to sum(p.quantity) on suspicion. If quantity is always 1, use count(*) instead. And drop the column quantity.

An index on child_table.parent_id would help performance substantially - if you get an index-only scan out of it.

edited Aug 15 '23 at 10:47

answered Aug 15 '23 at 00:54

Erwin Brandstetter

605,456
145
1,078
1,228

I'm guessing there's no other way to write another query for the child count per parent, which is the main thing I'm looking for here. I was hoping there is some feature I didn't know in Postgresql, similar to the aggregate functions in DB2. – user3758745 Aug 15 '23 at 15:10
I added another query in Edit#2 that seems to work for the specific range I'm testing on. What do you think of it? – user3758745 Aug 15 '23 at 16:43
Depending on what you actually need, there is a better query. My only difficulty is the lack of information. I tried asking. – Erwin Brandstetter Aug 16 '23 at 00:10

score 0 · Answer 2 · answered Aug 14 '23 at 23:56

0

Delete distinct from your query. This yields the same result, but is faster:

select dim1, count(p.quantity), sum(p.price), count(c.id) as child_count
from parent_table p
left join child_table c on c.parent_id = p.id
group by dim1

Being the PK of child_table, child_table.id is already unique so applying distinct will slow your query down for no benefit.

answered Aug 14 '23 at 23:56

Bohemian

412,405
93
575
722

This double counts parent measures – user3758745 Aug 15 '23 at 16:28

Looking for a more performant query

2 Answers2