0

I have 2 tables with flex one-to-many relationship, meaning the parent side may not have any child:

Parent_table

id   dim1   price
1    "abc"  1.00
2    "abc"  2.00
3    "def"  1.00

child_table:

id       parent_id
1          1
2          1

The parent query is very straight forward:

select dim1, count(*), sum(price) from parent_table group by dim1

Now, I want to also add another measure call "child_count" to the above query. By doing this, I get a very good performant query but naturally, it double counts the parent columns due to the join:

select p.dim1, count(*), sum(p.price), count(distinct c.id) as child_count
from parent_table p
  left outer join child_table c on c.parent_id = p.id
group by dim1

I believe the below query is exactly what I want, but these tables are extremely large and I cannot finish the query to verify:

select p.dim1, sum(*), sum(p.price), sum(child_count) as child_count
from parent_table p
  left outer join (select c.parent_id, count(c.id) child_count from child_table c) child on p.id = child.parent_id
group by dim1

Do you have or do you know if Postgres has any smart way to achieve the query I wanted without doing a nested query to count the child records for each parent?

Edit: I made some fix to my mistake on the existing question to move the focus away from it. The changes are:

add 2 more rows to parent table
delete quantity column  
change count(quantity) --> count(*)

Edit 2: hmm, I don't know why I forgot it exists, but this seems to work a lot better in Postgresql compared to joining once:

select p.dim1, count(*), sum(p.price), 
  sum(coalesce((select count(distinct id) from child_table where parent_id = p.id), 0) as child_count
from parent_table p
group by dim1

What do you think of it?

user3758745
  • 777
  • 1
  • 6
  • 19
  • 2
    Without the DDL for all tables and indexes involved and without the results from `explain(analyze, verbose, buffers, settings)` for your SQL statements, it will be next to impossible to improve performance. Please share some information, in plain text, as an update to your question. – Frank Heikens Aug 14 '23 at 23:35
  • When child_table.id is the primary key, there is no point in count(distinct id) since every value is unique by design. – Frank Heikens Aug 14 '23 at 23:43
  • `count(quantity)` makes no sense. If `quantity` is always `1`, it's not really a "quantity". It's just redundant noise. And if it isn't, `count(quantity)` seems wrong. Please disclose actual table definitions showing data types and constraints: `CREATE TABLE` scripts. Tell us your version of Postgres and what the added measure is supposed to measure *exactly*, rather than what you chose to call it. Follow instructions for performance questions here: https://stackoverflow.com/tags/postgresql-performance/info Most importantly: is `dim1` defined `UNIQUE`? – Erwin Brandstetter Aug 15 '23 at 00:39
  • Sorry, it was my mistake when writing up the example. It should be `count(*)` instead of quantity. The actual tables and query is very large involving many tables, while the "child" has 2 tables. There are FK relationship with index for every parent.id and child.parent_id tables. Even then, any query that isn't being filtered first is extremely slow. – user3758745 Aug 15 '23 at 14:54
  • I added another query in Edit#2 that seems to work for the specific range I'm testing on. What do you think of it? – user3758745 Aug 15 '23 at 16:43

2 Answers2

2

Going out on a limb, you probably want this query:

SELECT p.dim1
     , sum(p.quantity) AS sum_quantity
     , sum(p.price) AS sum_price
     , COALESCE(sum(child_count), 0) AS sum_child_count
FROM   parent_table p
LEFT   JOIN (
   SELECT parent_id AS id, count(*) AS child_count
   FROM   child_table
   GROUP  BY 1
   ) c USING (id)
GROUP  BY dim1;

Filling in experience with this kind of questions for missing information.

Aggregate rows in the child table before joining to the parent table. See:

Throwing in COALESCE to convert possible null values.

Also converted to count(quantity) to sum(p.quantity) on suspicion. If quantity is always 1, use count(*) instead. And drop the column quantity.

An index on child_table.parent_id would help performance substantially - if you get an index-only scan out of it.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I'm guessing there's no other way to write another query for the child count per parent, which is the main thing I'm looking for here. I was hoping there is some feature I didn't know in Postgresql, similar to the aggregate functions in DB2. – user3758745 Aug 15 '23 at 15:10
  • I added another query in Edit#2 that seems to work for the specific range I'm testing on. What do you think of it? – user3758745 Aug 15 '23 at 16:43
  • Depending on what you actually need, there is a better query. My only difficulty is the lack of information. I tried asking. – Erwin Brandstetter Aug 16 '23 at 00:10
0

Delete distinct from your query. This yields the same result, but is faster:

select dim1, count(p.quantity), sum(p.price), count(c.id) as child_count
from parent_table p
left join child_table c on c.parent_id = p.id
group by dim1

Being the PK of child_table, child_table.id is already unique so applying distinct will slow your query down for no benefit.

Bohemian
  • 412,405
  • 93
  • 575
  • 722