Disjoint Set Union with directed graph

Question

I understand the DSU strictly works with undirected graphs from this stack overflow question - Can we detect cycles in directed graph using Union-Find data structure?

Nevertheless, I am currently working on a problem that involves 400000 queries and a graph with at most 400000 nodes, in which there are two possible queries:

Connect nodes a and b (directed, of course)
Output "true" if node "x" is reachable from node 1. Otherwise, print "false."

However, my original instinct was to use DSU; that obviously would not work. Any suggestions? Thank you.

[This post](https://cstheory.stackexchange.com/questions/14343/what-is-the-fastest-deterministic-algorithm-for-dynamic-digraph-reachability-wit) might be useful, but unless you have more constraints on the graph, there may not be very fast algorithms. This is called the incremental single-source reachability problem. Since it's an active area of research in computer science, the CS or TCS sites may be able to help you more. — kcsquared, Feb 08 '22 at 04:07

score 3 · Answer 1 · answered Feb 08 '22 at 16:42

3

As long as you don't have to delete connections, and you're only interested in reachability from a single source, then you can do this in amortized constant time per operation pretty easily.

Maintain a set of nodes that are known to be reachable from node 1. Initially this would just contain node 1. We'll call these nodes 'marked'.
When you connect a->b, if a is marked but b is not, then mark all the newly reachable nodes. This is a transitive closure operation:
- put b in a queue.
- while the queue is not empty, remove a vertex, mark it reachable, and put all of its unmarked neighbors in the queue.

answered Feb 08 '22 at 16:42

Matt Timmermans

53,709
3
46
87

This is the best answer; while getting worst-case performance guarantees might be hard, amortized time guarantees can be achieved. Technically you might need to compute the transitive closure or something in a preprocessing step to get that time bound. Consider a graph where nodes 2 through `n` form a complete graph and node 1 is disconnected: the first two operations will take n^2 time with this exact strategy, since you're performing BFS on the whole graph. – kcsquared Feb 09 '22 at 04:18

Alexander Svozil · Accepted Answer · 2022-02-08T08:31:41.500

What you want is a data structure for a problem called 'incremental reachability'. There are multiple ways to construct such a data structure, all have some different update/query time tradeoffs. A very simple way to achieve the goal is to use an adjacency list and use BFS every time a user queries if node "x" is reachable from 1. This gets you update time: O(1) and query time: O(m).

A more complicated idea is 'even-shiloach' trees [1], here a BFS tree is maintained efficiently. Total update time: O(nm) Query time: O(1).

An experimental analysis of similar algorithms can be found in [2].

[1] Shimon Even, Yossi Shiloach: An On-Line Edge-Deletion Problem. J. ACM 28(1): 1-4 (1981) https://dl.acm.org/doi/10.1145/322234.322235

[2] Fully Dynamic Single-Source Reachability in Practice: An Experimental Study, Kathrin Hanauer, Monika Henzinger and Christian Schulz https://arxiv.org/abs/1905.01216

Disjoint Set Union with directed graph

2 Answers2