I am looking for a more efficient way to prune a directed acyclic graph (a DAG) constructed in jgrapht.
The DAG represents the relationships between a set of network conversations in time. The parents of a conversation are any conversations which completed before the child conversation started. Constructing the DAG is relatively straightforward, but there are a lot of unnecessary relationships. For efficiency, I want to prune the DAG so each child has a direct relationship to the minimal number of parents (or conversely, so each parent has the minimal number of immediate children).
The prune implementation I am using now (shown below) is based on code found in streme. It works for all of my manually constructed unit test scenarios. However, in a real data set, it is often fairly slow. I ran across a scenario today with 215 vertices but over 22,000 edges. Pruning that DAG took almost 8 minutes of clock time on server-class hardware -- tolerable for my immediate use case, but too slow to scale for larger scenarios.
I believe my problem is similar to the one described in What algorithm can I apply to this DAG? and Algorithm for Finding Redundant Edges in a Graph or Tree. That is, I need to find the transitive reduction or the minimal representation for my DAG. jgrapht does not appear to contain a direct implementation of transitive reduction for a DAG, only transitive closure.
I am looking for suggestions about how to improve the efficiency of the implementation below, or perhaps a pointer to an existing implementation of transitive reduction for jgrapht that I could use instead.
Note: Alternately, if there is a different graphing library for Java that includes a native implementation of transitive reduction, I could switch to that library. My use of jgrapht is confined to a single 200-line class, so swapping it out should not be difficult as long as the interface is similar. To maintain the class interface (persisted to a database), I need a DAG implementation that provides a way to get the parents and children of a given node -- similar to jgrapht's Graphs.predecessorListOf()
and Graphs.successorListOf()
.
import java.util.ArrayDeque;
import java.util.Deque;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.jgrapht.DirectedGraph;
import org.jgrapht.Graphs;
import org.jgrapht.alg.DijkstraShortestPath;
public static <V, E> void prune(DirectedGraph<V, E> dag) {
Deque<V> todo = new ArrayDeque<V>(dag.vertexSet());
Set<V> seen = new HashSet<V>();
while (!todo.isEmpty()) {
V v = todo.pop();
if (seen.contains(v)) {
continue;
}
seen.add(v);
List<V> targets = Graphs.successorListOf(dag, v);
for (int i = 0; i < targets.size(); i++) {
for (int j = i; j < targets.size(); j++) {
V vi = targets.get(i);
V vj = targets.get(j);
List<E> path = DijkstraShortestPath.findPathBetween(dag, vi, vj);
if (path != null && !path.isEmpty()) {
E edge = dag.getEdge(v, vj);
dag.removeEdge(edge);
}
}
}
}
}