8

I saw this example at pythontips. I do not understand the second line when defaultdict takes an argument "tree" and return a "tree".

import collections
tree = lambda: collections.defaultdict(tree)
some_dict = tree()
some_dict['color']['favor'] = "yellow"
# Works fine

After I run this code, I checked the type of some_dict

defaultdict(< function < lambda > at 0x7f19ae634048 >, 
            {'color': defaultdict(
                  < function < lambda > at 0x7f19ae634048 >, {'favor': 'yellow'})})
Prune
  • 76,765
  • 14
  • 60
  • 81
Z-Jiang
  • 189
  • 2
  • 10
  • btw the tip is from http://book.pythontips.com/en/testing/collections.html – sudo Aug 13 '18 at 18:47
  • It's a defaultdict of defacultdicts. It uses a recursive lambda function to generate a new empty defaultdict that itself will generate other defaultdicts when needed. – Gabriel Aug 13 '18 at 18:52
  • Similar question: https://stackoverflow.com/questions/19189274/defaultdict-of-defaultdict-nested (not exactly a duplicate, though) – mkrieger1 Aug 13 '18 at 18:56
  • 3
    This code is very clever. Perhaps too clever. – Gabriel Aug 13 '18 at 18:57
  • 1
    It's no more clever than any recursive function in Python. It's just using the name that the function currently being defined will be assigned to. – chepner Aug 13 '18 at 19:01
  • 3
    That recursive function is called inside `defaultdict`, which is why it's so clever. – Gabriel Aug 13 '18 at 19:03
  • 1
    @Gabriel How are you supposed to define a recursive data type without a recursive definition? – Jared Smith Aug 13 '18 at 19:44

3 Answers3

6

This is a pretty clever way to create a recursive defaultdict. It's a little tricky to understand at first but once you dig into what's happening, it's actually a pretty simple use of recursion.

In this example, we define a recursive lambda function, tree, that returns a defaultdict whose constructor is tree. Let's rewrite this using regular functions for clarity.

from collections import defaultdict
from pprint import pprint

def get_recursive_dict():
    return defaultdict(get_recursive_dict)

Note that we're returning defaultdict(get_recursive_dict) and not defaultdict(get_recursive_dict()). We want to pass defaultdict a callable object (i.e. the function get_recursive_dict). Actually calling get_recursive_dict() would result in infinite recursion.

If we call get_recursive_dict, we get an empty defaultdict whose default value is the function get_recursive_dict.

recursive_dict = get_recursive_dict()
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004FFC4A8>, {})

Let's see this in action. Create the key 'alice' and it's corresponding value defaults to an empty defaultdict whose default value is the function get_recursive_dict. Notice that this is the same default value as our recursive_dict!

print(recursive_dict['alice'])
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {'alice': defaultdict(<function get_recursive_dict at 0x0000000004AF46D8>, {})})

So we can create as many nested dictionaries as we want.

recursive_dict['bob']['age'] = 2
recursive_dict['charlie']['food']['dessert'] = 'cake'
print(recursive_dict)
# defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'charlie': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'food': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'dessert': 'cake'})}), 'bob': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {'age': 2}), 'alice': defaultdict(<function get_recursive_dict at 0x00000000049BD4A8>, {})})

Once you overwrite the default value with a key, you can no longer create arbitrarily deep nested dictionaries.

recursive_dict['bob']['age']['year'] = 2016
# TypeError: 'int' object does not support item assignment

I hope this clears things up!

Daniel Ong
  • 283
  • 1
  • 12
3

Two points to note:

  1. lambda represents an anonymous function.
  2. Functions are first-class objects in Python. They may be assigned to a variable like any other object.

So here are 2 different ways to define functionally identical objects. They are recursive functions because they reference themselves.

from collections import defaultdict

# anonymous
tree = lambda: defaultdict(tree)

# explicit
def tree(): return defaultdict(tree)

Running the final 2 lines with these different definitions in turn, you see only a subtle difference in the naming of the defaultdict type:

# anonymous
defaultdict(<function __main__.<lambda>()>,
            {'color': defaultdict(<function __main__.<lambda>()>,
                         {'favor': 'yellow'})})

# explicit
defaultdict(<function __main__.tree()>,
            {'color': defaultdict(<function __main__.tree()>,
                         {'favor': 'yellow'})})
jpp
  • 159,742
  • 34
  • 281
  • 339
1

It's easier to see if you try this: a = lambda: a, you'll see that a() returns a. So...

>>> a = lambda: a
>>> a()()()()
<function <lambda> at 0x102bffd08>

They're doing this with the defaultdict too. tree is a function returning a defaultdict whose default value is yet another defaultdict, and so on.

I wasn't actually aware of this either. I thought tree would have to be defined first. Maybe it's a special Python rule? (EDIT:) No, I forgot that Python does the name lookup at runtime, and tree already points to the lambda then. In C++ there's compile-time reference checking, but you can define functions that reference themselves.

It seems like a way to create behavior that some users wouldn't expect. Like say you accidentally redefine tree later, your defaultdict is broken:

>>> import collections
>>> tree = lambda: collections.defaultdict(tree)
>>> some_dict = tree()
>>> tree = 4
>>> some_dict[4][3] = 2 # TypeError: first argument must be callable or None
sudo
  • 5,604
  • 5
  • 40
  • 78
  • 1
    It's not special; name lookups occur when a value is needed, and the value isn't needed until the function is *called*. – chepner Aug 13 '18 at 18:55
  • 1
    No special python rule. When the lambda is ran the `tree` is already defined. – Gabriel Aug 13 '18 at 18:55
  • Right, that makes sense. So now you rely on `a` not changing. if you do `a = lambda: a` then `b = a` then `a = 3`, `b()` will return 3 now. – sudo Aug 13 '18 at 18:57