I am trying to trace a path in a river network graph by "walking" up it until a terminal node is reached. The problem is that when walking upstream, we encounter tributaries, or junctions, where our path can go multiple directions. I would like to specify the direction of the walk using an attribute of the links. I am trying to construct a CTE with postgres to achieve this, but cannot quite get there.
Here is a simple diagram of the network; reaches in red and nodes in blue:
Each reach has an area
attribute. This graph can be constructed via
CREATE TEMP TABLE tree (
id_reach integer PRIMARY KEY,
from_node integer UNIQUE NOT NULL,
to_node integer,
area float
);
INSERT INTO tree(id_reach, from_node, to_node, area) VALUES
(0, 1, 0, 40),
(1, 3, 1, 29),
(2, 2, 1, 5),
(3, 4, 3, 7),
(4, 5, 3, 6);
What I want to do
I would like to specify a node within this graph, then traverse upstream following the path of the reaches with the largest area
. So for the example shown, the outputs would look as
starting node | path |
---------------+---------
0 {0, 1, 3}
1 {1, 3}
3 {3}
I cannot figure out how to incorporate the area
information into the CTE. So far, the best I have come up with returns all the reaches upstream of the starting_node (which is to_node = 1
in the following):
WITH RECURSIVE us_path AS (
SELECT id_reach,
from_node,
to_node
FROM tree
WHERE to_node = 1
UNION ALL
SELECT tr.id_reach,
tr.from_node,
tr.to_node
FROM tree as tr
JOIN us_path AS usp ON tr.to_node = usp.from_node
)
SELECT id_reach
FROM us_path;
This of course returns
1
2
3
4
which are all the id_reach
s upstream.
How can I modify the CTE to only return a path that follows the id_reach
s with the largest area
?
What I have tried
I tried post-processing the results of the CTE above, but I could find no way to filter the results to return this path. I have tried inserting WHERE
and HAVING
statements into the CTE to try to select the reach with the largest area
, but could not get them to work either.
Update
Some great answers have been given below, all of which have taught me something new. The accepted answer is the simplest and fastest for my use case, although the other answers provide mechanisms for generalizing the problem a bit.