1

I'm writing a recursive breadth-first traversal of a network. The problem I ran into is that the network often looks like this:

    1
   / \
  2   3
   \ /
    4
    |
    5

So my traversal starts at 1, then traverses to 2, then 3. The next stop is to proceed to 4, so 2 traverses to 4. After this, 3 traverses to 4, and suddenly I'm duplicating work as both lines try to traverse to 5.

The solution I've found is to create a list called self.already_traversed, and every time a node is traversed, I append it to the list. Then, when I'm traversing from node 4, I check to make sure it hasn't already been traversed.

The problem here is that I'm using an instance variable for this, so I need a way to set up the list before the first recursion and a way to clean it up afterwards. The way I'm currently doing this is:

self.already_traversed = []
self._traverse_all_nodes(start_id)
self.already_traversed = []

Of course, it sucks to be twoggling variables outside of the function that's using them. Is there a better way to do this so this can be put into my traversal function?


Here's the actual code, though I recognize it's a bit dense:

def _traverse_all_nodes(self, root_authority, max_depth=6):
    """Recursively build a networkx graph

    Process is:
     - Work backwards through the authorities for self.cluster_end and all
       of its children.
     - For each authority, add it to a networkx graph, if:
        - it happened after self.cluster_start
        - it's in the Supreme Court
        - we haven't exceeded a max_depth of six cases.
        - we haven't already followed this path
    """
    g = networkx.Graph()
    if hasattr(self, 'already_traversed'):
        is_already_traversed = (root_authority.pk in self.visited_nodes)
    else:
        # First run. Create an empty list.
        self.already_traversed = []
        is_already_traversed = False

    is_past_max_depth = (max_depth <= 0)
    is_cluster_start_obj = (root_authority == self.cluster_start)
    blocking_conditions = [
        is_past_max_depth,
        is_cluster_start_obj,
        is_already_traversed,
    ]
    if not any(blocking_conditions):
        print "  No blocking conditions. Pressing on."
        self.visited_nodes.append(root_authority.pk)
        for authority in root_authority.authorities.filter(
                docket__court='scotus',
                date_filed__gte=self.cluster_start.date_filed):

            g.add_edge(root_authority.pk, authority.pk)
            # Combine our present graph with the result of the next
            # recursion
            g = networkx.compose(g, self._build_graph(
                authority,
                max_depth - 1,
            ))

    return g

def add_clusters(self):
    """Do the network analysis to add clusters to the model.

    Process is to:
     - Build a networkx graph
     - For all nodes in the graph, add them to self.clusters
    """
    self.already_traversed = []
    g = self._traverse_all_nodes(
        self.cluster_end,
        max_depth=6,
    )
    self.already_traversed = []
mlissner
  • 17,359
  • 18
  • 106
  • 169
  • Why not just pass a reference to your list to your recursive call? – Tripp Kinetics Sep 02 '15 at 19:25
  • Is there a way to do that so that each recursive call references the same list? For example, let's say 1 traversed to 2 and 3, and 2 traversed to 4. Next step is for 3 to traverse to 4, but the list it received when it was called only said that 1 had been traversed, so it would add itself to the list, and then proceed to 4. Unless I'm missing a trick here, (and it's possible -- I suck at recursion), each call of the function has it's own copy of the list. – mlissner Sep 02 '15 at 19:35
  • Do you have code where you can show this behavior? – Tripp Kinetics Sep 02 '15 at 19:45
  • OK, just studied this post [about passing by reference/assignment](https://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference), and I see Python has a neat trick I never knew about. Totally works just to pass in the list, thanks. If you want some points, make your comment an answer, and I'll send 'em your way. Thanks for the help! I love SO and I love Python! Incredible. – mlissner Sep 02 '15 at 19:49
  • By the way - networkx has commands that do lots of different traversals. It returns a list(I think?) of the nodes in the order of whichever traversal you want. – Joel Sep 02 '15 at 21:17
  • True, but this code is used to build the nx DiGraph from the data in the DB. I need the traversal *before* I have the nx object, unfortunately. Now...if Django had a way to do recursive queries...that'd be nice. – mlissner Sep 02 '15 at 21:20

1 Answers1

1

Check out:

How do I pass a variable by reference?

which contains an example on how to past a list by reference. If you pass the list by reference, every call to your function will refer to the same list.

Tripp Kinetics
  • 5,178
  • 2
  • 23
  • 37