Aggregating connected sets of nodes / edges

Question

I have a connected set of edges with unique nodes. They are connected using a parent node. Consider the following example code and illustration:

CREATE TABLE network (
  node integer PRIMARY KEY,
  parent integer REFERENCES network(node),
  length numeric NOT NULL
);
CREATE INDEX ON network (parent);
INSERT INTO network (node, parent, length) VALUES
  (1, NULL, 1.3),
  (2, 1, 1.2),
  (3, 2, 0.9),
  (4, 3, 1.4),
  (5, 4, 1.6),
  (6, 2, 1.5),
  (7, NULL, 1.0);

connected set

Visually, two groups of edges can be identified. How can the two groups be identified using PostgreSQL 9.1, and length summed? The expected result is shown:

result

 edges_in_group | total_edges | total_length
----------------+-------------+--------------
 {1,2,3,4,5,6}  |           6 |          7.9
 {7}            |           1 |          1.0
(2 rows)

I don't even know where to begin. Do I need a custom aggregate or window function? Could I use WITH RECURSIVE to iteratively collect edges that connect? My real world case is a stream network of 245,000 edges. I expect the maximum number of edges_in_group to be less than 200, and a couple hundred aggregated groups (rows).

score 5 · Accepted Answer · answered Feb 27 '15 at 07:08

A recursive query is the way to go:

with recursive tree as (
  select node, parent, length, node as root_id
  from network
  where parent is null
  union all
  select c.node, c.parent, c.length, p.root_id
  from network c
    join tree p on p.node = c.parent
)
select root_id, array_agg(node) as edges_in_group, sum(length) as total_length
from tree
group by root_id;

The important thing is to keep the id of the root node in each recursion, so that you can group by that id in the final result.

The is a very fast an simple method! Scales impressively well on a real-world example. — Mike T, Feb 28 '15 at 00:41

Aggregating connected sets of nodes / edges

1 Answers1

Linked