2

I have a representation of a (binary) tree as a table in PostgreSQL. Each row is a node of the tree and each row has a parent column and a color.

The nodes at he leaves of the tree have a color (their color column is not null and can be either green or red).

My goal is to color the whole tree. With these rules:

  • If a parent has one child, then the parent's color is the child's color.
  • If a parent has two children and both have the same color then the parent color is its children color.
  • If a parent has two children and they have different color, then the parent color is gray.

How can I write a recursive query in PostgreSQL for this algorithm?

Example of a tree:

CREATE TEMPORARY TABLE family (
  node text
, parent text
, color text
);

INSERT INTO family (parent, node, color)
VALUES
  (null , 'A'   , null)
, ('A'  , 'AB'  , null)
, ('A'  , 'AC'  , null)
, ('AB' , 'ABD' , null)
, ('ABD', 'ABDE', 'green')
, ('AC' , 'ACF' , null)
, ('AC' , 'ACG' , null)
, ('ACF', 'ACFH', 'red')
, ('ACF', 'ACFI', 'green')
, ('ACG', 'ACGJ', 'red')
, ('ACG', 'ACGK', 'red')
;

This tree, fully colored:

A colored tree

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
homam
  • 1,945
  • 1
  • 19
  • 26

2 Answers2

1

You can use a recursive cte, aggregating after you perform the join on the parent node in a subquery:

with recursive cte(parent, node, color) as (
   select t.parent, t.node, case when t.c1 = t.c2 then t.c1 else 'gray' end    
   from (select f1.parent, f1.node, max(f.color) c1, min(f.color) c2 
         from family f 
         join family f1 on f1.node = f.parent where f.color is not null 
         group by f1.parent, f1.node) t
   union all
   select t1.parent, t1.node, case when t1.c1 = t1.c2 then t1.c1 else 'gray' end  
   from (select t.parent, t.node, max(t.color) c1, min(t.color) c2 
         from (select f.parent, f.node, c.color from cte c 
               join family f on f.node = c.parent) t 
         group by t.parent, t.node) t1
)
select c.parent, c.node, c.color from cte c
union all
select f.parent, f.node, f.color from family f where f.color is not null

See fiddle.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thanks, but something seems to be wrong with this algorithm. Here's a sample of my actual data: https://dbfiddle.uk/wynbYEbp Node '91.73.128.0/17' must be gray, but it is repeated three times in the result set, once blue, other times red. – homam Apr 07 '23 at 07:25
  • @homam Does your dataset represent a complete binary tree? – Ajax1234 Apr 07 '23 at 14:48
  • No, the table contains many distinct trees. So there are several roots. – homam Apr 07 '23 at 16:14
1

A recursive CTE does not allow aggregation in the recursive term.

I suggest to update the original table one level at a time, top-down:

DO
$do$
DECLARE
   _max_depth int := (SELECT max(length(node)) - 1 FROM family);
   _depth int;
BEGIN
   FOR _depth IN REVERSE _max_depth .. 1 LOOP
      UPDATE family p
      SET    color = c.color
      FROM  (
         SELECT parent, CASE WHEN min(color) = max(color) THEN min(color) ELSE 'gray' END AS color
         FROM   family
         WHERE  length(node) = _depth + 1
         GROUP  BY 1
         ) c
      WHERE  p.node = c.parent
      AND    p.color IS DISTINCT FROM c.color;  -- optional
   END LOOP;
END
$do$;

fiddle

If you don't want to touch the original table, operate on a copy of the table, maybe a TEMPORARY table for best performance.

I use length(node) as indicator of the nesting level. If that's unreliable, replace it with a reliable nesting level. Either way, if the table is big and there are many levels, an index on the level indicator will help performance. For the demo that would be:

CREATE INDEX family_level_idx ON family (length(node));

The added WHERE clause p.color IS DISTINCT FROM c.color skips empty updates. See:

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228