2

I am trying to get an array of categories associated with each product and then also get the top-level parent category of each product in another column, which by my logic is finding the same values for the categories array, but only selecting where parent_id is NULL which should pull back only one value and 1 record per id.

I really don't know the best way to structure this query. What I have kind of works, but it also shows NULL values in the parent category column for the categories that do have a parent ID and makes a second record for each product because I am forced to put it in the group by. Basically, I think I am not doing this in the correct or most efficient way.

Desired result:

+----+----------------+------------------+------------------------------------------------+------------------+
| id | name           | category_ids     | category_names                                 | parent_category  |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1  | Product Name 1 | {111,222,333}    | {Electronics, computers, computer accessories} | Electronics      |
+----+----------------+------------------+------------------------------------------------+------------------+

My current query (which is not ideal):

select p.id, 
p.name, 
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc  on pc.product_id = p.id
join categories c  on pc.category_id = c.id
full outer join (
   select name, id from categories
   where parent_id is null and name is not null
   ) c1 on c.id = c1.id
group by 1,2,5;
+----+----------------+------------------+-----------------------------------+------------------+
| id | name           | category_ids     | category_names                    | parent_category  |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {111}            | {Electronics}                     | Electronics      |
+----+----------------+------------------+-----------------------------------+------------------+
| 1  | Product Name 1 | {222,333}        | {computers, computer accessories} | NULL             |
+----+----------------+------------------+-----------------------------------+------------------+
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Cole
  • 99
  • 10
  • Try using `max(c1.name)` and drop the `group by 5`. – clamp Apr 07 '20 at 19:58
  • @clamp This is perfect! Thank you! I knew I was missing something. Is doing the join with a nested select the best way to do this? I tried doing self joins and other things, but this is the closest I was able to get. – Cole Apr 07 '20 at 20:06
  • Cou can write it using a `WITH` clause or 'CTE' [doc](https://www.postgresql.org/docs/12/queries-with.html) – clamp Apr 07 '20 at 21:00
  • @clamp Thanks! I'll have to look into those because I am not too familiar with them. – Cole Apr 07 '20 at 21:18

1 Answers1

0

Replace the FULL JOIN with an aggregate FILTER clause:

SELECT p.id
     , p.name
     , array_agg(pc.category_id) AS category_ids
     , string_agg(c.name, ', ')  AS category_names  -- regexp_replace .. ?
     , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
FROM   products                p
JOIN   product_categorizations pc ON pc.product_id = p.id
JOIN   categories              c  ON pc.category_id = c.id
GROUP  BY p.id;

See:

(Why would you add AND name IS NOT NULL? Either way, min() ignores NULL values anyway.)

While aggregating all products, and while referential integrity is enforced, this should be a bit faster:

SELECT p.name, pc.*
FROM   products p
JOIN  (
   SELECT pc.product_id AS id
        , array_agg(pc.category_id) AS category_ids
        , string_agg(c.name, ', ')  AS category_names
        , min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
   FROM   product_categorizations pc
   JOIN   categories              c  ON pc.category_id = c.id
   GROUP  BY 1
   ) pc  USING (id);

The point being that product only joins after aggregating rows.

Aside: "name" is not a very helpful column name. Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you so much for this answer! This is exactly what I was looking for. I knew there must be a better way to build my query and the way you explained everything is super helpful! The name IS NOT NULL thing was some dumb part I added and forgot to remove. Forgive me! – Cole Apr 08 '20 at 17:04