6

I am new to WITH RECURSIVE in PostgreSQL. I have a reasonably standard recursive query that is following an adjacency list. If I have, for example:

1 -> 2
2 -> 3
3 -> 4
3 -> 5
5 -> 6

it produces:

1
1,2
1,2,3
1,2,3,4
1,2,3,5
1,2,3,5,6

What I would like is to have just:

1,2,3,4
1,2,3,5,6

But I can't see how to do this in Postgres. This would seem to be "choose the longest paths" or "choose the paths that are not contained in another path". I can probably see how to do this with a join on itself, but that seems quite inefficient.

An example query is:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
   SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
   FROM graph g
  UNION ALL
   SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
   FROM graph g, search_graph sg
   WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
brechmos
  • 1,278
  • 1
  • 11
  • 22
  • 1
    Search for numbers that do not have children. Build from found childless numbers up. – Ihor Romanchenko Apr 22 '15 at 18:42
  • 2
    BTW: your intended output `1,2,3,4 | 1,2,3,5,6` *cannot* exist, since each node only has one `link` field, and thus only one successor. ('3' has both '4' and '5' as successors. – wildplasser Apr 22 '15 at 19:48
  • Stupid user interface: I misclicked and marked Erwins comment as offensive :-) – wildplasser Apr 22 '15 at 20:25
  • The other question was ltree specific. That one I haven't got back to. The answer below was helpful. – brechmos Apr 22 '15 at 20:28
  • Here is a related answer to a related question: http://stackoverflow.com/a/29738605/939860. Is your case "simple" as your demo suggest, less simple or an undirected graph where even cycles are possible? – Erwin Brandstetter Apr 22 '15 at 20:28
  • I hope it will be helpful for future readers. It will not solve your Original Question, though. (see my comment above) – wildplasser Apr 22 '15 at 20:29

3 Answers3

3

You already have a solution at your fingertips with cycle, just add a predicate at the end.

But adjust your break condition by one level, currently you are appending one node too many:

WITH RECURSIVE search AS (
   SELECT id, link, data, ARRAY[g.id] AS path, (link = id) AS cycle
   FROM   graph g
   WHERE  NOT EXISTS (
      SELECT 1
      FROM   graph
      WHERE  link = g.id
      )

   UNION ALL
   SELECT g.id, g.link, g.data, s.path || g.id, g.link = ANY(s.path)
   FROM   search s
   JOIN   graph g ON g.id = s.link
   WHERE  NOT s.cycle
   )
SELECT *
FROM   search
WHERE cycle;
-- WHERE cycle IS NOT FALSE;  -- alternative if link can be NULL
  • Also including a start condition like mentioned by @wildplasser.

  • Init condition for cycle is (link = id) to catch shortcut cycles. Not necessary if you have a CHECK constraint to disallow that in your table.

  • The exact implementation depends on the missing details.

  • This is assuming all graphs are terminated with a cycle or link IS NULL and there is a FK constraint from link to id in the same table. The exact implementation depends on missing details. If link is not actually a link (no referential integrity), you need to adapt ...

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

Just add the extra clause to the final query, like in:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
   SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
    FROM graph g
    -- BTW: you should add a START-CONDITION here, like:
    -- WHERE g.id = 1
    -- or even (to find ALL linked lists):
    -- WHERE NOT EXISTS ( SELECT 13
          -- FROM graph nx
          -- WHERE nx.link = g.id
          -- )
  UNION ALL
     SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
    FROM graph g, search_graph sg
    WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph sg
WHERE NOT EXISTS ( -- <<-- extra condition
   SELECT 42 FROM graph nx
   WHERE nx.id = sg.link
    );

Do note that:

  • the not exists(...) -clause tries to join exactly the same record as the second leg of the recursive union.
  • So: they are mutually exclusive.
  • if it would exist, it should have been appended to the "list" by the recursive query.
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • What if the results are '1,2,3,4' and '1,2,4,5'? Won't the first one be ignored although it's not a part of the other one? – Jakub Kania Apr 22 '15 at 17:51
  • No, the not exists-clause tries to join *exactly* the same record as the second "leg" of the recursive union. So: they are mutually exclusive. (if it would exist, it should have been appended to the "list") – wildplasser Apr 22 '15 at 17:57
  • Thank you. This is helpful, I was just thinking of a join which would be painful. And +1 for using 42. – brechmos Apr 22 '15 at 20:28
  • maybe by using ltree; but the plain recursive query will always yield the partial results, too. (as you know, I know) @ErwinBrandstetter – wildplasser Apr 22 '15 at 21:22
0

I'm not sure that this should be considered as ugly join solution.

WITH recursive graph (child, parent) AS (
    SELECT 2, 1
    UNION
    SELECT 3, 2
    UNION
    SELECT 4, 2
    UNION
    SELECT 6, 5
    UNION
    SELECT 7, 6
    UNION
    SELECT 6, 7
),
paths (start, node, depth, path, has_cycle, terminated) AS (
    SELECT
        ARRAY[g1.parent],
        false,
        false
    FROM graph g1
    WHERE true
        AND NOT EXISTS (SELECT 1 FROM graph g2 WHERE g1.parent = g2.child)
    UNION ALL
    SELECT
        p.path || g.child,
        g.child = ANY(p.path),
        g.parent is null AS terminated
    FROM paths p
    LEFT OUTER JOIN graph g ON g.parent = p.node
    WHERE NOT has_cycle
)
SELECT * from path WHERE terminated
;

So the trick is to use terminated column via using LEFT OUTER JOIN and then select only terminated paths.

user1685095
  • 5,787
  • 9
  • 51
  • 100