2

I have this code in my script to perform Huffman encoding:

def huffmanEncoding(freqDict):
    for key in freqDict.keys():
        freqDict[HuffmanTree(value=key)] = freqDict.pop(key)
...

What I want to do is to replace each of the keys in the dictionary with a tree node whose value is the original key. The HuffmanTree class works properly.

However, this code has very weird behavior. With debugging tools I found out that sometimes some of the keys were processed twice or more times, which means they are first transformed into a tree node, then transformed again, using its current tree node as the value for the new tree node.

I replaced my code with the one shown following:

def huffmanEncoding(freqDict):
    keys = list(freqDict.keys())
    for key in keys:
        freqDict[HuffmanTree(value=key)] = freqDict.pop(key)

and now it works properly. But can someone explain why my first version has such weird behavior? If I want to change all the keys in a dictionary, should I always use the second version?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    You are presumably using Python 3, because the behaviour you describe is specific to Python dictionary views. – Martijn Pieters Oct 26 '15 at 09:32
  • unrelated: no need to call `.keys()` here, you could use instead: `for key in list(freqDict): ..` – jfs Oct 26 '15 at 13:09

2 Answers2

7

You are adding keys to and removing keys from the dictionary while iterating. That means that the order of the keys also can change.

Normally, Python would raise an exception when you do this, but because you are both deleting and adding one key each iteration, the internal checks that would raise that exception fail to detect that you are making changes while iterating.

Iterate over a copy of the keys view:

def huffmanEncoding(freqDict):
    for key in list(freqDict):
        freqDict[HuffmanTree(value=key)] = freqDict.pop(key)

The list() call copies all the keys to a separate list object. Rather than you iterating over a live view of the dictionary keys you iterate over a static unchanging list. Popping keys from the original dict will not also remove those keys from the list copy, nor does setting new keys result in the list gaining more keys. That makes the loop entirely stable.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

The second form will take all keys from the dictionary, and turn it into a list. This list is not connected to the dictionary, so changing the keys in the dictionary won't alter that list.

In the first form, keys returns an iterator (it will generate values on the fly), so changing the dictionary might cause keys you add inside the for loop to be returned by the keys iterator.

Always use the second form.

ojii
  • 4,729
  • 2
  • 23
  • 34