2

I have an edges table in my PostgreSQL database that represents the edges of a directed graph, with two columns: node_from and node_to (value is a node's id).

Given a single node (initial_node) I'd like to be able to traverse the entire graph, but in an undirected way.

What I mean is, for instance for the following graph :

(a->b)
(c->b)
(c->d)

If initial_node is a, b, c, or d, in any case, I would get [a, b, c, d].

I used the following SQL query (based on http://www.postgresql.org/docs/8.4/static/queries-with.html ):

WITH RECURSIVE search_graph(uniq, depth, path, cycle) AS (
        SELECT
          CASE WHEN g.node_from = 'initial_node' THEN g.node_to ELSE g.node_from END,
          1,
          CASE WHEN g.node_from = 'initial_node' THEN ARRAY[g.node_from] ELSE ARRAY[g.node_to] END,
          false
        FROM edges g
        WHERE 'initial_node' in (node_from, node_to)
      UNION ALL
        SELECT
          CASE WHEN g.node_from = sg.uniq THEN g.node_to ELSE g.node_from END,
          sg.depth + 1,
          CASE WHEN g.node_from = sg.uniq THEN path || g.node_from ELSE path || g.node_to END,
          g.node_to = ANY(path) OR g.node_from = ANY(path)
        FROM edges g, search_graph sg
        WHERE sg.uniq IN (g.node_from, g.node_to) AND NOT cycle
)
SELECT * FROM search_graph

It worked fine... Until I had a case with 12 nodes that are all connected together, in all directions (for each pair I have both (a->b) and (b->a)), which makes the query loops indefinitely. (Changing UNION ALL to UNION doesn't eliminate the looping.)

Has anyone any piece of advice to handle this issue?

Cheers,

Antoine.

Antoine Dusséaux
  • 3,740
  • 3
  • 23
  • 28

1 Answers1

1

I got to this, it should not get into infinite loops with any kind of data:

--create temp table edges ("from" text, "to" text);
--insert into edges values ('initial_node', 'a'), ('a', 'b'), ('a', 'c'), ('c', 'd');

with recursive graph(points) as (
  select array(select distinct "to" from edges where "from" = 'initial_node')
  union all
  select g.points || e1.p || e2.p
  from graph g
  left join lateral (
    select array(
      select distinct "to"
      from edges 
      where "from" =any(g.points) and "to" <>all(g.points) and "to" <> 'initial_node') AS p) e1 on (true)
  left join lateral (
    select array(
      select distinct "from"
      from edges 
      where "to" =any(g.points) and "from" <>all(g.points) and "from" <> 'initial_node') AS p) e2 on (true)
  where e1.p <> '{}' OR e2.p <> '{}'
  )
select distinct unnest(points)
from graph
order by 1

Recursive queries are very limiting in terms of what can be selected, and since they don't allow using the recursive results inside a subselect, one can't use NOT IN (select * from recursive where...). Storing results in an array, using LEFT JOIN LATERAL and using =ANY() and <>ALL() solved this conundrum.

Ezequiel Tolnay
  • 4,302
  • 1
  • 19
  • 28
  • 1
    Thanks a lot Ziggy ! The query doesn't loop indefinitely for the most complex cases and even for simple graphs, it's twice as fast as my previous implementation. – Antoine Dusséaux May 23 '16 at 09:28
  • Just to add something, in my case for the initial part it's: SELECT DISTINCT "to" AS "uniq" FROM edges WHERE "from" = 'initial_node' UNION SELECT DISTINCT "to" AS "uniq" FROM edges WHERE "to" = 'initial_node') – Antoine Dusséaux May 24 '16 at 07:41
  • I also had to add initial_node to the array (unnsest(points||array('initial_node')) because in some cases, I wouldn't get it in the result. – quambo Mar 23 '20 at 13:05