1

My goal

To calculate the sum of components sold as part of their parent assemblies.

I'm sure this must be a common use case, but I haven't yet found documentation that leads to the result I'm looking for.

Background

I'm running Postgres 11 on CentOS 7. I have some tables like as follows:

CREATE TABLE the_schema.names_categories (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    thing_name TEXT NOT NULL, 
    thing_category TEXT NOT NULL
);

CREATE TABLE the_schema.relator (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    parent_name TEXT NOT NULL, 
    child_name TEXT NOT NULL,
    child_quantity INTEGER NOT NULL 
);

CREATE TABLE the_schema.sales (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),    
    sold_name TEXT NOT NULL,
    sold_quantity INTEGER NOT NULL
);

And a view like so, which is mainly to associate the category key with relator.child_name for filtering:

CREATE VIEW the_schema.relationships_with_child_catetgory AS (
    SELECT 
    r.parent_name,
    r.child_name, 
    r.child_quantity,
    n.thing_category AS child_category
    FROM 
    the_schema.relator r 
    INNER JOIN 
    the_schema.names_categories n 
    ON r.child_name = n.thing_name 
);

And these tables contain some data like this:

INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('subChild1', 'component'), ('subChild2', 'component');

INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 10), ('child1', 'subChild2', 2);

INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2);

I need to construct a query that, given these data, will return something like the following:

 child_name | sum_sold 
------------+----------
 subChild1  |       30
 subChild2  |        6
(2 rows)

The problem is that I haven't the first idea how to go about this and in fact it's getting scarier as I type. I'm having a really hard time visualizing the connections that need to be made, so it's difficult to get started in a logical way. Usually, Molinaro's SQL Cookbook has something to get started on, and it does have a section on hierarchical queries, but near as I can tell, none of them serve this particular purpose.

Based on my research on this site, it seems like I probably need to use a recursive CTE /Common Table Expression, as demonstrated in this question/answer, but I'm having considerable difficulty understanding this method and how to use this it for my case.

Aping the example from E. Brandstetter's answer linked above, I arrive at:

WITH RECURSIVE cte AS (
    SELECT 
    s.sold_name,
    r.child_name, 
    s.sold_quantity AS total 
    FROM 
    the_schema.sales s
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r 
    ON s.sold_name = r.parent_name

    UNION  ALL

    SELECT 
    c.sold_name, 
    r.child_name, 
    (c.total  *  r.child_quantity)
    FROM 
    cte c 
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r
    ON r.parent_name = c.child_name
) SELECT * FROM cte 

which gets part of the way there:

 sold_name | child_name | total 
-----------+------------+-------
 parent1   | child1     |     1
 parent1   | child1     |     2
 parent1   | subChild1  |    10
 parent1   | subChild1  |    20
 parent1   | subChild2  |     2
 parent1   | subChild2  |     4
(6 rows)

However, these results include undesired rows (the first two), and when I try to filter the CTE by adding where r.child_category = 'component' to both parts, the query returns no rows:

 sold_name | child_name | total 
-----------+------------+-------
(0 rows)

and when I try to group/aggregate, it gives the following error: ERROR: aggregate functions are not allowed in a recursive query's recursive term

I'm stuck on how to get the undesired rows filtered out and the aggregation happening; clearly I'm failing to comprehend how this recursive CTE works. All guidance is appreciated!

Mister October
  • 165
  • 1
  • 9

2 Answers2

1

Basically you have the solution. If you stored the quantities and categories in your CTE as well, you can simply add a WHERE filter and a SUM aggregation afterwards:

SELECT
    child_name,
    SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name

My entire query looks like this (which only differs in the details I mentioned above from yours):

demo:db<>fiddle

WITH RECURSIVE cte AS (
    SELECT 
        s.sold_name,
        s.sold_quantity,
        r.child_name,
        r.child_quantity,
        nc.thing_category as category
    FROM 
        sales s
    JOIN relator r
    ON s.sold_name = r.parent_name
    JOIN names_categories nc
    ON r.child_name = nc.thing_name
    
    UNION ALL
    
    SELECT
        cte.sold_name,
        cte.sold_quantity,
        r.child_name,
        r.child_quantity,
        nc.thing_category
    FROM cte
    JOIN relator r ON cte.child_name = r.parent_name
    JOIN names_categories nc
    ON r.child_name = nc.thing_name
)
SELECT
    child_name,
    SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name

Note: I didn't use your view, because I found it more handy to fetch the data from directly from the tables instead of joining data I already have. But that's just the way I personally like it :)

S-Man
  • 22,521
  • 7
  • 40
  • 63
0

Well, I figured out that the CTE can be used as a subquery, which permits the filtering and aggregation that I needed :

SELECT
cte.child_name, 
sum(cte.total)
FROM
(
WITH RECURSIVE cte AS (
    SELECT 
    s.sold_name,
    r.child_name, 
    s.sold_quantity AS total 
    FROM 
    the_schema.sales s
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r 
    ON s.sold_name = r.parent_name

    UNION  ALL

    SELECT 
    c.sold_name, 
    r.child_name, 
    (c.total  *  r.child_quantity)
    FROM 
    cte c 
    INNER JOIN 
    the_schema.relationships_with_child_catetgory r
    ON r.parent_name = c.child_name

) SELECT * FROM cte  ) AS cte
INNER JOIN 
the_schema.relationships_with_child_catetgory r1 
ON cte.child_name = r1.child_name 
WHERE r1.child_category = 'component'
GROUP BY cte.child_name
;

which gives the desired rows:

 child_name | sum 
------------+-----
 subChild2  |   6
 subChild1  |  30
(2 rows)

Which is good and probably enough for the actual case at hand-- but I suspect there's a clearner way to go about this, so I'll be eager to read all other offered answers.

Mister October
  • 165
  • 1
  • 9