I'm working in an obscure language with poor dependency management. To help out 14000 file codebase, I've written some parsing tools (in Java) and have generated a dependency graph.
I wrote my own graph and BFS classes, and they work just fine. With them, I have such methods as getParents()
and getChildren()
.
Now I'm trying to find the 'islands' in this graph; that is to say, I'm trying to find what sections of our codebase are not dependent on each other, in the hopes of gathering them into isolated modules.
Later, I also plan to analyze the individual islands to see if there's any weak points in them where we can establish a module barrier and define that module's interface, but that's down the road.
Right now, I'm thinking of doing this:
Map<DependencyEntry, Set<DependencyEntry>> allChildren = new ...;
for(DependencyEntry entry : allFiles) allChildren.put(entry,getAllChildren(entry));
Set<DependencyEntry> visited = new ...;
Set<DependencyEntry> largest = new HashSet<DependencyEntry>(); // size 0
// slightly more expensive but more readable
for(DependencyEntry entry : allChildren.keySet()) {
Set<DependencyEntry> set = allChildren.get(key);
if(set.size() > largest.size()) largest = set;
}
visited.addAll(largest);
This should give me the largest island. From there, I can go through and exclude any sets that contain any visited nodes, and then run it again to get the next largest island, and so forth.
Is this an accurate algorithm? Is there a better way to solve this problem that I'm not seeing?