-1

Say we have a large SQLite table edge CREATE TABLE edge(from TEXT, to TEXT) which contains all the edges of a directed graph (could be cyclic). Each node in the graph has a unique id of type TEXT. How can we find all paths from a given node to all the connected nodes efficiently? Only 1 path is needed if there are multiple paths connecting 2 nodes. Since the graph is very large, get all possible paths first and remove duplicates by checking start/end pair is not efficient enough.

enter image description here

For example if the given node is A, the result (not unique) should be like below.

A->B
A->B->E
A->B->E->C
A->F

I came up with a recursive CTE solution but it becomes very slow if the table is too big. It seems that we could only UNION on all columns of the cte and it makes the recursive search very ineffective.

WITH RECURSIVE cte(to, chain) AS (
    SELECT edge.to, edge.to || '->' || edge.from FROM edge
        WHERE edge.from = 'A'
    UNION ALL
    SELECT edge.to, edge.to || '->' || cte.chain FROM edge
        JOIN cte
        ON cte.to = edge.from
)
SELECT MIN(chain) as selected_chain FROM cte
GROUP BY cte.to
ORDER BY selected_chain
wanghan02
  • 1,227
  • 7
  • 14

1 Answers1

1

The best way I can see to do this is a multi-pass solution. First step: create a temp table with A->B and A->F -- in other words, only the rows that have a from-node equal to your starting node. In each subsequent step, insert all the rows that have a from-node equal to any of the to-nodes in the temp table and where the new to-node is not already in the table. To eliminate duplicate to-nodes, make this select choose the max (or min, it doesn't matter) from-node. Each pass will add exactly one more row for each to-node that it adds, and every accessible to-node will eventually be added. Once you have this table you can implement a breadth-first search on the resulting reduced graph.

Mike Christie
  • 381
  • 1
  • 10