0

I cannot, for the life of me, figure out how Python thinks my dict is changing size during iteration. I have tried reworking the code a few times, and cannot figure out where I would be modifying the dictionary.

Overview: I am working on a hierarchical clustering algorithm where I build an MST out of a graph I have created, and then remove edges that are weaker than a specified threshold. This all seems to work fine, but now when I am going through and computing the clusters (a list of lists), I run into this really odd error. Below is my code:

def compute_clusters(self):
    """ wrapper function to compute clusters via DFS """
    mst = self.mst
    total_nodes = len(mst.keys())
    visited = set()
    for node in mst.keys():
        if node not in visited:
            self.clusters += self.cluster_dfs(mst, node, visited)

def cluster_dfs(self, mst, node, visited, cluster=[]):
    """ creates clusters through dfs """
    cluster.append(node)
    if self.dfs_finished(mst, node, visited):
        return cluster
    for neighbor in self.mst[node].keys():
        if neighbor not in visited:
            visited.add(neighbor)
            cluster.append(neighbor)
            self.cluster_dfs(mst, neighbor, visited, cluster)

def dfs_finished(self, mst, node, visited):
    for neighbor in self.mst[node].keys():
        if neighbor not in visited:
            return False
    return True

Basically, mst is a copy of my MST (defaultdict(dict)), it maps all of the nodes to their neighbors:weights.

I figured an easy approach would be to perform a DFS from each node in the MST that has not been touched yet by the DFS. By this logic, the recursion would only return after all elements in that particular cluster have been visited. And then it goes to the next cluster and does a DFS.

My RunTime error is:

Traceback (most recent call last):
File "cluster.py", line 91, in <module>
  cluster.build_clusters()
File "cluster.py", line 25, in build_clusters
  self.compute_clusters() # compute final clusters
File "cluster.py", line 66, in compute_clusters
  for node in mst.keys():
RuntimeError: dictionary changed size during iteration

Does anyone see a spot where I could maybe be accidentally modifying the dict? Apologies if it is a dumb mistake - I am a bit tired.

Jack Ryan
  • 1,287
  • 12
  • 26
  • 3
    Not directly related to your problem, but I'm concerned about your function definition that contains `cluster=[]`. Are you aware of the [default mutable argument gotcha](http://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument)? It can cause some hard-to-detect problems if it catches you by surprise. – Kevin Nov 30 '16 at 19:26
  • should the last line of `cluster_dfs` have a `return`? Otherwise you'll be adding `None` s into `self.clusters` – Patrick Haugh Nov 30 '16 at 19:29
  • I don't see any obvious dict modification in the code you've shared. Can you provide a [mcve]? I'd be quite interested in replicating the error on my own machine. – Kevin Nov 30 '16 at 19:32

2 Answers2

4

Does anyone see a spot where I could maybe be accidentally modifying the dict?

Given mst is a defaultdict, one possible suspect is:

for neighbor in self.mst[node].keys():

Since that could add node to self.mst. If this is the issue it leaves the question of how; for that more of the context / mst setup might help.


Seems like it does, accessing a non-existing key to a defaultdict will add the key. .... an mcve..

d = collections.defaultdict(int)
print(d)
keys = ['a','b','c']
for key in keys:
    print(key, hex(d[key]))
print(d)

>>> 
defaultdict(<class 'int'>, {})
a 0x0
b 0x0
c 0x0
defaultdict(<class 'int'>, {'a': 0, 'c': 0, 'b': 0})
>>>
wwii
  • 23,232
  • 7
  • 37
  • 77
everial
  • 302
  • 1
  • 10
  • Certainly indexing a defaultdict with a key it doesn't yet contain would cause its size to change. But it looks like `node` is always already a key in `mst`, since it originally gets its value from `for node in mst.keys()`... I agree that more context would be very useful. – Kevin Nov 30 '16 at 19:41
  • Agreed it'd be surprising; I was imagining a situation where some node thinks it has a neighbor that (for whatever reason) didn't end up in the original `mst` and it's dying on one of the recursive calls. edit: sniped by Kevin =) – everial Nov 30 '16 at 19:48
  • Oops, I see that `node` can also come from `for neighbor in...` in the recursive case. So this is still a real possibility. – Kevin Nov 30 '16 at 19:48
  • @wwii I'm not following how that snippet relates (other than being more code that uses a `defaultdict`--might elaborating? – everial Nov 30 '16 at 19:55
  • Feel free to delete. It sounded like you were not sure if it was possible - ```...could add node to self.mst. If this is the issue ..```, so I provided an example that was relavent to your uncertainty. FWIW I briefly searched the help center to see if there were restrictions on *improving* someone else's answer and didn't see any. It wasn't meant to step on anyones toes. – wwii Nov 30 '16 at 20:03
  • Apologies for the delay in responding @all, and especially apologize for asking what seems to be a debugging question. Anyways, you were exactly right. When I constructed the MST (Prim's), I mapped each source node to its minimum neighbor but I forgot to map it back the other way (i.e. `mst[dest][source] = edge_weight`. So the MST did not have all of the vertices from the original graph. – Jack Ryan Nov 30 '16 at 21:00
2

I guess the error is here:

for neighbor in self.mst[node].keys():
                     ^^^^^^^^^
    if neighbor not in visited:
        self.cluster_dfs(mst, neighbor, visited, cluster)
                         ^^^

mst[node] has key neighbor but that doesn't need to be true for mst itself