My goal
To calculate the sum of components sold as part of their parent assemblies.
I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.
Background
I'm running Postgres 11 on CentOS 7. I have some tables like as follows:
CREATE TABLE the_schema.names_categories (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
thing_name TEXT NOT NULL,
thing_category TEXT NOT NULL
);
CREATE TABLE the_schema.relator (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
parent_name TEXT NOT NULL,
child_name TEXT NOT NULL,
child_quantity INTEGER NOT NULL
);
CREATE TABLE the_schema.sales (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMPTZ DEFAULT now(),
sold_name TEXT NOT NULL,
sold_quantity INTEGER NOT NULL
);
And a view like so, which is mainly to associate the category key with relator.child_name for filtering:
CREATE VIEW the_schema.relationships_with_child_catetgory AS (
SELECT
r.parent_name,
r.child_name,
r.child_quantity,
n.thing_category AS child_category
FROM
the_schema.relator r
INNER JOIN
the_schema.names_categories n
ON r.child_name = n.thing_name
);
And these tables contain some data like this:
INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);
INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);
I need to construct a query that, given these data, will return something like the following:
child_name | sum_sold
------------+----------
subChild1 | 30
subChild2 | 6
(2 rows)
The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way. Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.
Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.
Aping the example from E. Brandstetter's answer linked above, I arrive at:
WITH RECURSIVE cte AS (
SELECT
s.sold_name,
r.child_name,
s.sold_quantity AS total
FROM
the_schema.sales s
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON s.sold_name = r.parent_name
UNION ALL
SELECT
c.sold_name,
r.child_name,
(c.total * r.child_quantity)
FROM
cte c
INNER JOIN
the_schema.relationships_with_child_catetgory r
ON r.parent_name = c.child_name
) SELECT * FROM cte
which gets part of the way there:
sold_name | child_name | total
-----------+------------+-------
parent1 | child1 | 1
parent1 | child1 | 2
parent1 | subChild1 | 10
parent1 | subChild1 | 20
parent1 | subChild2 | 2
parent1 | subChild2 | 4
(6 rows)
However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component'
to both parts, the query returns no rows:
sold_name | child_name | total
-----------+------------+-------
(0 rows)
and when I try to group/aggregate, it gives the following error:
ERROR: aggregate functions are not allowed in a recursive query's recursive term
I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!