0

I'm modeling a category hierarchy in postgres using the nested set model as explain here. I am using a query to find the immediate subordinates of a node which is explained below in an excerpt from the article:

Imagine you are showing a category of electronics products on a retailer web site. When a user clicks on a category, you would want to show the products of that category, as well as list its immediate sub-categories, but not the entire tree of categories beneath it. For this, we need to show the node and its immediate sub-nodes, but no further down the tree. For example, when showing the PORTABLE ELECTRONICS category, we will want to show MP3 PLAYERS, CD PLAYERS, and 2 WAY RADIOS, but not FLASH.

This can be easily accomplished by adding a HAVING clause to our previous query:

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node.name, (COUNT(parent.name) - 1) AS depth
                FROM nested_category AS node,
                        nested_category AS parent
                WHERE node.lft BETWEEN parent.lft AND parent.rgt
                        AND node.name = 'PORTABLE ELECTRONICS'
                GROUP BY node.name
                ORDER BY node.lft
        )AS sub_tree WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name GROUP BY node.name HAVING depth <= 1 ORDER BY node.lft;

My schema differs slightly from the articles schema in the following ways:

  • I'm using postgres. The article is using MySql.
  • My category definitions live in a separate table
  • My hierarchy table is called category_tree. The article calls theirs nested_category

Here is my category table definition:

(
    category_id uuid NOT NULL,
    name text COLLATE pg_catalog."default" NOT NULL,
    description text COLLATE pg_catalog."default",
    created_at timestamp with time zone NOT NULL,
    updated_at timestamp with time zone NOT NULL,
    updated_by text COLLATE pg_catalog."default" NOT NULL,
    image_url text COLLATE pg_catalog."default",
    is_main boolean NOT NULL DEFAULT false,
    CONSTRAINT category_category_id_pk PRIMARY KEY (category_id),
    CONSTRAINT category_category_name_key UNIQUE (name)
)

Here is my category tree table definition:

(
    category_tree_id uuid NOT NULL,
    category_id uuid NOT NULL,
    lft integer NOT NULL,
    rgt integer NOT NULL,
    created_at timestamp with time zone NOT NULL,
    updated_at timestamp with time zone NOT NULL,
    updated_by text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT category_tree_pkey PRIMARY KEY (category_tree_id)
)

I have translated the query in the article to fit the schema in my db to the following:

SELECT node.category_tree_id,
       node.category_id,
       cat.name,
       node.lft,
       node.rgt,
       ( Count(parent.category_tree_id) - ( sub_tree.depth + 1 ) ) AS depth,
       node.created_at,
       node.updated_at,
       node.updated_by
FROM   category_tree AS node
       JOIN category cat
         ON cat.category_id = node.category_id,
       category_tree AS parent,
       category_tree AS sub_parent,
       (SELECT node.category_tree_id,
               ( Count(parent.category_tree_id) - 1 ) AS depth
        FROM   category_tree AS node,
               category_tree AS parent
        WHERE  node.lft BETWEEN parent.lft AND parent.rgt
               AND node.category_tree_id = 'f47f6269-9605-459e-a319-2b0177f9f4d0' //lookup by category tree id
        GROUP  BY node.category_tree_id
        ORDER  BY node.lft) AS sub_tree
WHERE  ( node.lft BETWEEN parent.lft AND parent.rgt )
       AND ( node.lft BETWEEN sub_parent.lft AND sub_parent.rgt )
       AND ( sub_parent.category_tree_id = sub_tree.category_tree_id )
GROUP  BY node.category_tree_id,
          depth,
          cat.name
HAVING depth <= 1
ORDER  BY node.lft 

This query does not produce any results and the issue seems to be coming from the HAVING depth ... clause. I have two rows in my category_tree table who's depth value are 0 and 1, respectively as shown below:

"f47f6269-9605-459e-a319-2b0177f9f4d0", "d06a143b-523e-4136-8a17-1049abbf76f4", "Parent", 7, 10, 0
"14b939e9-6784-4905-ba24-f67768c57085", "21191930-a5b9-4868-883f-3798f29d70a3", "Child", 8, 9,  1

Oddly enough, if I change the clause HAVING depth > 1 I actually get the two results I'm expecting, which is obviously wrong.

What's my issue here?

PM 77-1
  • 12,933
  • 21
  • 68
  • 111
Joe Berg
  • 774
  • 2
  • 8
  • 21
  • Is there a reason you're using nested sets, which are quite difficult, instead of [recursive queries](https://www.postgresql.org/docs/current/queries-with.html)? – Bill Karwin Apr 28 '21 at 16:35
  • Recursive makes querying the tree inefficient as it gets large. I want flexibility in querying the tree – Joe Berg Apr 28 '21 at 16:36
  • 1
    You might like the Closure Table alternative I describe [here](https://stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree/192462#192462) or [here](https://www.slideshare.net/billkarwin/recursive-query-throwdown) or [here](https://pragprog.com/titles/bksqla/sql-antipatterns/). I have queried trees of 500k nodes in milliseconds. – Bill Karwin Apr 28 '21 at 16:38
  • PostgreSql or Mysql?? Why don´t you tag correctly?? – nacho Apr 28 '21 at 17:10
  • @nacho thought both were relevant since article is written for mysql and my question is for postgres – Joe Berg Apr 28 '21 at 17:11

2 Answers2

1

Answering my own question.

You can't use a column alias on the same level where you defined it and having is only allowed in a query that uses aggregation. If you want to avoid repeating the expression, use a derived table. -source

Full query:

SELECT node.category_tree_id,
       node.category_id,
       cat.name,
       node.lft,
       node.rgt,
       (Count(parent.category_tree_id) - ( sub_tree.depth + 1 )) AS depth,
       node.created_at,
       node.updated_at,
       node.updated_by
FROM   category_tree AS node
       JOIN category cat
         ON cat.category_id = node.category_id,
       category_tree AS parent,
       category_tree AS sub_parent,
       (SELECT node.category_tree_id,
               ( Count(parent.category_tree_id) - 1) AS depth
        FROM   category_tree AS node,
               category_tree AS parent
        WHERE  node.lft BETWEEN parent.lft AND parent.rgt
               AND node.category_tree_id = 'f47f6269-9605-459e-a319-2b0177f9f4d0'
        GROUP  BY node.category_tree_id
        ORDER  BY node.lft) AS sub_tree
WHERE  (node.lft BETWEEN parent.lft AND parent.rgt)
       AND (node.lft BETWEEN sub_parent.lft AND sub_parent.rgt)
       AND (sub_parent.category_tree_id = sub_tree.category_tree_id)
GROUP  BY node.category_tree_id,
          depth,
          cat.name
HAVING (Count(parent.category_tree_id) - ( sub_tree.depth + 1 )) <= 1
ORDER  BY node.lft 
Joe Berg
  • 774
  • 2
  • 8
  • 21
0

I tested your query on MySQL and I find that your depth alias defined in the outer query is ambiguous with the subquery's column subtree.depth.

If I rename the outer query's alias something distinct e.g. depth2, and reference it in HAVING depth2 <= 1 I get the two rows for "parent" and "child" as you would expect.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • If I try that I am getting the error: "ERROR: column "depth2" does not exist LINE 29: HAVING depth2 <= 1". Hard to imagine that postgres would behave differently than mysql in this case. I changed the alias on line 6 of my query to `AS depth2` and updated line 29 of my query to be `HAVING depth2 <= 1`. Did I misunderstand your answer? – Joe Berg Apr 28 '21 at 19:07
  • That's exactly what I described. Change the alias name and then make reference to the changed name. – Bill Karwin Apr 28 '21 at 19:15
  • When I do that I'm getting the error I described in my previous comment. – Joe Berg Apr 28 '21 at 19:17
  • Ah - this is indeed a postgres specific limitation. The reason I'm getting that error is because in postgres, where is evaluated before select, so you can't use alias declared in select in where clause. – Joe Berg Apr 28 '21 at 19:35
  • You can't use an alias in the WHERE clause in MySQL either. What does that have to do with the HAVING clause? – Bill Karwin Apr 28 '21 at 19:45
  • I was wrong. Correction: You can't use a column alias on the same level where you defined it and having is only allowed in a query that uses aggregation. If you want to avoid repeating the expression, use a derived table. source: https://dba.stackexchange.com/questions/281438/why-does-an-alias-with-a-having-clause-not-exist-in-postgresql – Joe Berg Apr 28 '21 at 19:51