1

I'm trying to identify bubble structures in a directed graph (orientation is left to right). I start walking the graph from a possible start node, for example the green node. Then I add one node to all paths, adding copies of paths when paths diverge again, like this:

first iteration: [[3],[7]]

Second iteration: [[3,4],[3,5],[7,8],[7,9]]

After every iteration I want to check whether any paths intersect and save them as confirmed bubbles. Currently I'm using a nested for-loop to compare every path to one another but the amount of paths can get very large and thus the script can get very slow. The order of the path matters.

Example bubble

Any suggestions on how to improve the speed of comparing a path to another path in a list of paths?

def ExtendPaths(paths, outEdges):
newPaths = []
for path in paths:                                              
    nextNodes = GetNextSegment(path[len(path) - 1], outEdges)   
    if len(nextNodes) == 0:
        j=5
    else:
        for node in nextNodes:                                      
            newPath = list(path)                                    
            newPath.append(node)                                    
            newPaths.append(newPath)                    
return newPaths

def ComparePaths(paths, putativeBubbleStartSegment):
length = len(paths)
for path1 in paths:
    for path2 in paths:
        if path2 != path1:
            if len(set(path1).intersection(path2)) > 0:
                #Bubble confirmed

Where GetNextSegment simply returns a list of nodes connected to the node given to the function (in this case, the last node of the path). outEdges is a dictionary with: node:[out,going,edges]. In ComparePaths() a bubble is confirmed when the length of the .intersection() of 2 paths is greater than 0.

A bubble is a graph structure where 2 paths diverge (from the green node, for example) and finally come together again. In this example, the bubble would go from 2 to 11 with all nodes between them.

I'm not asking for a complete bubble-finding algorithm, just for ideas on how to compare all paths to all other paths quickly.

1 Answers1

1

Instead of using a list of lists, consider using a set of tuples (if order matters) or a set of frozensets (if order does not matter). Initialize newPaths with newPaths = set(), then add each path as a tuple or frozenset (which are hashable) rather than a list:

for node in nextNodes:                                      
    newPath = tuple(path) + (node,)
    # or: newPath = frozenset(path).union({node})
    newPaths.add(newPath)

This should make it a bit faster to check membership and intersections.

Also it looks like you are checking the same paths multiple times by looping through paths twice. For example, if you have path1 equal to (3, 4) and path2 equal to (3, 5), you don't need to check (3, 4) versus (3, 5) and also (3, 5) versus (3, 4), since your check appears symmetrical. You could simplify ComparePaths by using an itertools helper:

from itertools import combinations

def ComparePaths(paths, putativeBubbleStartSegment):
    # This gets all combinations of paths without repeating any pairing.
    for path1, path2 in combinations(paths, 2)
        # Don't need to check the length of the intersection because an
        # empty set is always equivalent to "False" in an if statement.
        if set(path1).intersection(path2):
            # Bubble confirmed

It seems like your sample code is leaving out some details (since there are unused function arguments and variables), but what I see here doesn't seem like it should work for what you're trying to do. As a result, it's hard to suggest any other speed-ups, even though there might be other ways to improve your algorithm.

user108471
  • 2,488
  • 3
  • 28
  • 41
  • Thanks! good idea, I'll try that and see if it improves runtime a lot. – Stan Derksen Oct 19 '17 at 14:21
  • @StanDerksen After re-reading your code, I'm no longer sure the "set of tuples" solution will make the biggest difference, but it kind of looks like you might have left some code out? There are some variables you assign like `j = 5` or `length = len(paths)` that you don't use, so I'm not sure what that means for the rest of your algorithm. – user108471 Oct 19 '17 at 14:54