2

My question emerged when looking here to find how to create a trie in python. The following code was given in the top-voted answer:

>>> _end = '_end_'
>>> 
>>> def make_trie(*words):
...     root = dict()
...     for word in words:
...         current_dict = root
...         for letter in word:
...             current_dict = current_dict.setdefault(letter, {})
...         current_dict[_end] = _end
...     return root
... 
>>> make_trie('foo', 'bar', 'baz', 'barz')
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}}, 
'z': {'_end_': '_end_'}}}, 'f': {'o': {'o': {'_end_': '_end_'}}}}

I don't understand what purpose the line "current_dict = root" serves; seems like deleting that line and substituting all current_dict with root would do the same thing. (This same thought is expressed in this reply but with no answer.) I know this actually doesn't work as I tried it and an empty dictionary was returned.

I also tried putting print statements in the second for loop to see how current_dict and root were updated. I thought that since they were set to be equal, they referred to the same dictionary and would be updated simultaneously, but that wasn't the case.

Clearly, I have a fundamental misunderstanding of this interaction. Help?

Community
  • 1
  • 1
tobakudan
  • 35
  • 5

1 Answers1

0

You have to reset current_dict = root for each single word, because current_dict = current_dict.setdefault(letter, {}) sets current_dict to a new empty dictionary or existing sub-dictionary of root if the key is already in the dict.

dict.setdefault(k, d) is a bit tricky because it does two things at the same time. It works just like dict.get and returns the value of the key k if it exists, otherwise the default value d. And if the key doesn't exist it also inserts it with d as the value.

So, as you see, current_dict is not always the root dict, but also refers to the sub-dicts when you're iterating over the letters in the word, and you have to reset it to root to start again at the top level.

skrx
  • 19,980
  • 5
  • 34
  • 48
  • Thanks for the answer! So before the first iteration of the second for loop, 'current_dict = root = {}'. So far so good. After the first iteration, 'current_dict = {}' and 'root = {'f': {}}'. This part still confuses me. My understanding is that 'setdefault()' first changes the value of what 'current_dict' is pointing to (the same thing that 'root' is pointing to, which is why 'root' is updated) and then returns a value, which 'current_dict' is now set to point to, in this case the '{}' in 'root' with key 'f'. – tobakudan May 10 '17 at 13:44
  • But why doesn't 'current_dict' merely become an empty dictionary, not the specific one in 'root' with key 'f'? Because 'setdefault()' just returns '{}', not the particular '{}' in 'root' with key 'f', right? – tobakudan May 10 '17 at 13:44
  • `current_dict` is just a name that is attached to an object. At the start of the function both names `root` and `current_dict` are attached to the same "root" dict. Then setdefault inserts a new key and empty dict and also returns this empty dict and assigns it to the name `current_dict`. Now the two names `root` and `current_dict` refer to different objects: `root` still refers to the root dict and `current_dict` refers to the new empty sub-dict that has just been inserted. – skrx May 10 '17 at 15:28
  • Ned Batchelder gave a nice talk about this topic (if the names/values are the reason for the confusion): [Facts and Myths about Python names and values](https://www.youtube.com/watch?v=_AEJHKGk9ns) – skrx May 10 '17 at 16:26