215

Is there a way to make a defaultdict also be the default for the defaultdict? (i.e. infinite-level recursive defaultdict?)

I want to be able to do:

x = defaultdict(...stuff...)
x[0][1][0]
{}

So, I can do x = defaultdict(defaultdict), but that's only a second level:

x[0]
{}
x[0][0]
KeyError: 0

There are recipes that can do this. But can it be done simply just using the normal defaultdict arguments?

Note this is asking how to do an infinite-level recursive defaultdict, so it's distinct to Python: defaultdict of defaultdict?, which was how to do a two-level defaultdict.

I'll probably just end up using the bunch pattern, but when I realized I didn't know how to do this, it got me interested.

smci
  • 32,567
  • 20
  • 113
  • 146
Corley Brigman
  • 11,633
  • 5
  • 33
  • 40
  • Possible duplicate of [Python: defaultdict of defaultdict?](http://stackoverflow.com/questions/5029934/python-defaultdict-of-defaultdict) – malioboro Jan 13 '17 at 11:06
  • 2
    Not really... added info to the question to indicate why. Though that is a useful question. – Corley Brigman Jan 13 '17 at 19:54

12 Answers12

311

The other answers here tell you how to create a defaultdict which contains "infinitely many" defaultdict, but they fail to address what I think may have been your initial need which was to simply have a two-depth defaultdict.

You may have been looking for:

defaultdict(lambda: defaultdict(dict))

The reasons why you might prefer this construct are:

  • It is more explicit than the recursive solution, and therefore likely more understandable to the reader.
  • This enables the "leaf" of the defaultdict to be something other than a dictionary, e.g.,: defaultdict(lambda: defaultdict(list)) or defaultdict(lambda: defaultdict(set))
jmiserez
  • 2,991
  • 1
  • 23
  • 34
Chris W.
  • 37,583
  • 36
  • 99
  • 136
  • 5
    defaultdict(lambda: defaultdict(list)) The correct form ? – Yuvaraj Loganathan Feb 09 '15 at 10:12
  • Ooops, yes, the `lambda` form is correct--because the `defaultdict(something)` returns a dictionary-like object, but `defaultdict` expects a callable! Thank you! – Chris W. Feb 12 '15 at 20:39
  • 6
    This got marked as a possible duplicate of another question... but it wasn't my original question. I knew how to create a two-level defaultdict; what I didn't know was how to make it recursive. This answer is, in fact, similar to http://stackoverflow.com/questions/5029934/python-defaultdict-of-defaultdict – Corley Brigman Jan 13 '17 at 19:52
  • 1
    One draw back of the lambda approach is that the objects it generates can't be pickled... but you can get around this by casting to a regular `dict(result)` before the pickle – CpILL Mar 15 '20 at 23:38
  • This seems to only work for recursion levels greater than 1. For example `nested_dict['foo']['bar'].append('baz')` works, but `nested_dict['foo'].append('bar')` fails because the `defaultdict` class has no `append` attribute – Addison Klinke Feb 22 '21 at 22:27
  • @AddisonKlinke, I'm guessing you defined `nested_dict` as `defaultdict(lambda: defaultdict(list))`, for which `nested_dict['foo']` is a `defaultdict(list)` which does not have an `append` attribute, but `nested_dict['foo']['bar']` is a `list` which *does* have an `append` attribute. Meanwhile, if you had defined `nested_dict` as `defaultdict(lambda: defaultdict(dict))`, then `nested_dict['foo']['bar']` is a `dict` which does not have an `append` attribute. – joseville Oct 08 '21 at 16:28
252

For an arbitrary number of levels:

def rec_dd():
    return defaultdict(rec_dd)

>>> x = rec_dd()
>>> x['a']['b']['c']['d']
defaultdict(<function rec_dd at 0x7f0dcef81500>, {})
>>> print json.dumps(x)
{"a": {"b": {"c": {"d": {}}}}}

Of course you could also do this with a lambda, but I find lambdas to be less readable. In any case it would look like this:

rec_dd = lambda: defaultdict(rec_dd)
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
82

There is a nifty trick for doing that:

tree = lambda: defaultdict(tree)

Then you can create your x with x = tree().

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
23

Similar to BrenBarn's solution, but doesn't contain the name of the variable tree twice, so it works even after changes to the variable dictionary:

tree = (lambda f: f(f))(lambda a: (lambda: defaultdict(a(a))))

Then you can create each new x with x = tree().


For the def version, we can use function closure scope to protect the data structure from the flaw where existing instances stop working if the tree name is rebound. It looks like this:

from collections import defaultdict

def tree():
    def the_tree():
        return defaultdict(the_tree)
    return the_tree()
wim
  • 338,267
  • 99
  • 616
  • 750
pts
  • 80,836
  • 20
  • 110
  • 183
  • 4
    i'll have to think about this one (it's a little more complex). but i think your point is that if do x = tree(), but then someone comes by later and does tree=None, this one would still work, and that would wouldn't? – Corley Brigman Oct 04 '13 at 20:58
23

I would also propose more OOP-styled implementation, which supports infinite nesting as well as properly formatted repr.

class NestedDefaultDict(defaultdict):
    def __init__(self, *args, **kwargs):
        super(NestedDefaultDict, self).__init__(NestedDefaultDict, *args, **kwargs)

    def __repr__(self):
        return repr(dict(self))

Usage:

my_dict = NestedDefaultDict()
my_dict['a']['b'] = 1
my_dict['a']['c']['d'] = 2
my_dict['b']

print(my_dict)  # {'a': {'b': 1, 'c': {'d': 2}}, 'b': {}}
Ciprian Tomoiagă
  • 3,773
  • 4
  • 41
  • 65
Stanislav Tsepa
  • 710
  • 8
  • 13
  • 2
    Neat ! I added the passthrough of `*args` and `**kwargs` which allows it to function like the `defaultdict`, namely to create a dict with keyword arguments. This is useful for passing `NestedDefaultDict` into `json.load` – Ciprian Tomoiagă Aug 06 '19 at 13:47
  • Trying `my_dict = NestedDefaultDict(list)` returns a `TypeError` - is the `*args` intended to allow the leaf type definition in that way? – Addison Klinke Feb 22 '21 at 22:23
  • @AddisonKlinke No, in this implementation it's not. The `default_factory` argument is already taken by `NestedDefaultDict` type. There is no easy way to check if current node is leaf or not without building more complex class. However, you can write something like `my_dict['a']['b'][0]` to imitate node with list type. – Stanislav Tsepa Feb 24 '21 at 10:28
  • Unfortunately, this answer does not appear to support multiprocessing. – Chris Coffee Jan 10 '23 at 19:02
  • I like this! However, it doesn't support increment or list append: `my_dict['a']['b'] += 1` ...will fail. Is there any way to get it to support the remaining dict operations? – RandallShanePhD Apr 11 '23 at 20:40
1

I based this of Andrew's answer here. If you are looking to load data from a json or an existing dict into the nester defaultdict see this example:

def nested_defaultdict(existing=None, **kwargs):
    if existing is None:
        existing = {}
    if not isinstance(existing, dict):
        return existing
    existing = {key: nested_defaultdict(val) for key, val in existing.items()}
    return defaultdict(nested_defaultdict, existing, **kwargs)

https://gist.github.com/nucklehead/2d29628bb49115f3c30e78c071207775

nucklehead
  • 96
  • 1
  • 4
1

Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.

(cross posting from Can't pickle defaultdict)

def wrap_defaultdict(instance, times=1):
    """Wrap an instance an arbitrary number of `times` to create nested defaultdict.
    
    Parameters
    ----------
    instance - list, dict, int, collections.Counter
    times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
    
    Notes
    -----
    using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
        - thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
    """
    from collections import defaultdict

    def _dd(x):
        return defaultdict(x.copy)

    dd = defaultdict(instance)
    for i in range(times-1):
        dd = _dd(dd)

    return dd
BML
  • 191
  • 2
  • 12
1

Based on Chris W answer, however, to address the type annotation concern, you could make it a factory function that defines the detailed types. For example this is the final solution to my problem when I was researching this question:

def frequency_map_factory() -> dict[str, dict[str, int]]:
    """
    Provides a recorder of: per X:str, frequency of Y:str occurrences.
    """
    return defaultdict(lambda: defaultdict(int))
hi2meuk
  • 1,080
  • 11
  • 9
0

here is a recursive function to convert a recursive default dict to a normal dict

def defdict_to_dict(defdict, finaldict):
    # pass in an empty dict for finaldict
    for k, v in defdict.items():
        if isinstance(v, defaultdict):
            # new level created and that is the new value
            finaldict[k] = defdict_to_dict(v, {})
        else:
            finaldict[k] = v
    return finaldict

defdict_to_dict(my_rec_default_dict, {})
Dr. XD
  • 41
  • 2
0

@nucklehead's response can be extended to handle arrays in JSON as well:

def nested_dict(existing=None, **kwargs):
    if existing is None:
        existing = defaultdict()
    if isinstance(existing, list):
        existing = [nested_dict(val) for val in existing]
    if not isinstance(existing, dict):
        return existing
    existing = {key: nested_dict(val) for key, val in existing.items()}
    return defaultdict(nested_dict, existing, **kwargs)
Josh Olson
  • 397
  • 2
  • 15
0

Here's a solution similar to @Stanislav's answer that works with multiprocessing and also allows for termination of the nesting:

from collections import defaultdict
from functools import partial

class NestedDD(defaultdict):
    def __init__(self, n, *args, **kwargs):
        self.n = n
        factory = partial(build_nested_dd, n=n - 1) if n > 1 else int
        super().__init__(factory, *args, **kwargs)

    def __repr__(self):
        return repr(dict(self))

def build_nested_dd(n):
    return NestedDD(n)
0

Here is a solution similar to @Chris W., that makes more levels possible. It still allows the 'leaf' to be specified as something besides defaultdict.

Instead of a lambda, a closure is defined.

You might prefer this method because,

  • the declaration of the nested defaultdict is written as nested functions so it might read easier.
  • More than two levels are possible.
  • The last leaf can be: list, set, ...

Here is an example.

from collections import defaultdict
import json

def another_defaultdict(factory):
    'return another layer of defaultdict as a factory function'
    def layer():
        return defaultdict(factory)  
    return layer




>>> # two levels
>>> d = defaultdict(another_defaultdict(list))

>>> # three levels
>>> d = defaultdict(another_defaultdict(another_defaultdict(list)))


>>> d['Canada']['Alberta'] = ['Calgary', 'Magrath', 'Cardston', 'Lethbridge']
>>> d['France']['Nord'] = ['Dunkirk', 'Croix']
>>> print(json.dumps(d, indent=2))
{
  "Canada": {
    "Alberta": [
      "Calgary",
      "Magrath",
      "Cardston",
      "Lethbridge"
    ]
  },
  "France": {
    "Nord": [
      "Dunkirk",
      "Croix"
    ]
  }
}

brocla
  • 123
  • 6