199

Thanks to some great folks on SO, I discovered the possibilities offered by collections.defaultdict, notably in readability and speed. I have put them to use with success.

Now I would like to implement three levels of dictionaries, the two top ones being defaultdict and the lowest one being int. I don't find the appropriate way to do this. Here is my attempt:

from collections import defaultdict
d = defaultdict(defaultdict)
a = [("key1", {"a1":22, "a2":33}),
     ("key2", {"a1":32, "a2":55}),
     ("key3", {"a1":43, "a2":44})]
for i in a:
    d[i[0]] = i[1]

Now this works, but the following, which is the desired behavior, doesn't:

d["key4"]["a1"] + 1

I suspect that I should have declared somewhere that the second level defaultdict is of type int, but I didn't find where or how to do so.

The reason I am using defaultdict in the first place is to avoid having to initialize the dictionary for each new key.

Any more elegant suggestion?

Thanks pythoneers!

Morlock
  • 6,880
  • 16
  • 43
  • 50

6 Answers6

388

Use:

from collections import defaultdict
d = defaultdict(lambda: defaultdict(int))

This will create a new defaultdict(int) whenever a new key is accessed in d.

StevenWernerCS
  • 839
  • 9
  • 15
interjay
  • 107,303
  • 21
  • 270
  • 254
  • 2
    Only problem is it won't pickle, meaning `multiprocessing` is unhappy about sending these back and forth. – Noah Mar 27 '12 at 16:49
  • 22
    @Noah: It will pickle if you use a named module-level function instead of a lambda. – interjay Mar 27 '12 at 17:28
  • @interjay can you elaborate on this please? – ScienceFriction Oct 10 '13 at 21:44
  • 4
    @ScienceFriction Anything specific that you need help with? When `d[new_key]` is accessed, it will call the lambda which will create a new `defaultdict(int)`. And when `d[existing_key][new_key2]` is accessed, a new `int` will be created. – interjay Oct 11 '13 at 12:53
  • 13
    This is awesome. It seems I renew my marital vows to Python daily. – mVChr Nov 03 '14 at 22:32
  • 3
    Looking for more details about using this method with `multiprocessing` and what a named module-level function is? This [question](http://stackoverflow.com/questions/16439301/cant-pickle-defaultdict) follows up. – Cecilia Apr 15 '15 at 17:03
  • I get a `NameError: global name 'defaultdict' is not defined` error calling `d["key4"]["a1"] +=1` . What's going on? I've imported `defaultdict` from `collections` so it should be in my namespace. – David Boshton Jun 12 '15 at 08:53
  • ultimate answer!... Thanks. worked in both 3.5 and 2.7. – Hara Jun 06 '16 at 13:28
  • How can this be generalized to work with an arbitrary number of levels? edit: i'm now seeing a newer answer near the bottom which generalizes by subclassing defaultdict and that will work nicely for me. out of curiosity, can THIS answer (ie the lambda approach) be similarly generalized? – ibonyun May 29 '19 at 19:54
  • 1
    @ibonyun You can add an additional level in the same manner, e.g. `d = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))` – interjay May 29 '19 at 20:06
36

Another way to make a pickleable, nested defaultdict is to use a partial object instead of a lambda:

from functools import partial
...
d = defaultdict(partial(defaultdict, int))

This will work because the defaultdict class is globally accessible at the module level:

"You can't pickle a partial object unless the function [or in this case, class] it wraps is globally accessible ... under its __name__ (within its __module__)" -- Pickling wrapped partial functions

Community
  • 1
  • 1
Nathaniel Gentile
  • 1,753
  • 1
  • 12
  • 11
14

Look at nosklo's answer here for a more general solution.

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}
Smart Manoj
  • 5,230
  • 4
  • 34
  • 59
miles82
  • 6,584
  • 38
  • 28
  • Thanks for the link @miles82 (and the edit, @voyager). How pythonesque and safe is this approach? – Morlock Apr 08 '10 at 14:57
  • 2
    Unfortunately this solution doesn't preserve the handiest part of defaultdict, which is the power to write something like D['key']+=1 without worrying about the existence of the key. That's the main feature I use defaultdict for... but I can imagine dynamically deepening dictionaries are pretty handy too. – rschwieb Mar 25 '14 at 00:21
  • 3
    @rschwieb you can add the power to write += 1 by adding __add__ method. – spazm Aug 21 '14 at 21:54
7

As per @rschwieb's request for D['key'] += 1, we can expand on previous by overriding addition by defining __add__ method, to make this behave more like a collections.Counter()

First __missing__ will be called to create a new empty value, which will be passed into __add__. We test the value, counting on empty values to be False.

See emulating numeric types for more information on overriding.

from numbers import Number


class autovivify(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def __add__(self, x):
        """ override addition for numeric types when self is empty """
        if not self and isinstance(x, Number):
            return x
        raise ValueError

    def __sub__(self, x):
        if not self and isinstance(x, Number):
            return -1 * x
        raise ValueError

Examples:

>>> import autovivify
>>> a = autovivify.autovivify()
>>> a
{}
>>> a[2]
{}
>>> a
{2: {}}
>>> a[4] += 1
>>> a[5][3][2] -= 1
>>> a
{2: {}, 4: 1, 5: {3: {2: -1}}}

Rather than checking argument is a Number (very non-python, amirite!) we could just provide a default 0 value and then attempt the operation:

class av2(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def __add__(self, x):
        """ override addition when self is empty """
        if not self:
            return 0 + x
        raise ValueError

    def __sub__(self, x):
        """ override subtraction when self is empty """
        if not self:
            return 0 - x
        raise ValueError
Community
  • 1
  • 1
spazm
  • 4,399
  • 31
  • 30
6

Late to the party, but for arbitrary depth I just found myself doing something like this:

from collections import defaultdict

class DeepDict(defaultdict):
    def __call__(self):
        return DeepDict(self.default_factory)

The trick here is basically to make the DeepDict instance itself a valid factory for constructing missing values. Now we can do things like

dd = DeepDict(DeepDict(list))
dd[1][2].extend([3,4])
sum(dd[1][2])  # 7

ddd = DeepDict(DeepDict(DeepDict(list)))
ddd[1][2][3].extend([4,5])
sum(ddd[1][2][3])  # 9
Rad Haring
  • 905
  • 7
  • 7
1
def _sub_getitem(self, k):
    try:
        # sub.__class__.__bases__[0]
        real_val = self.__class__.mro()[-2].__getitem__(self, k)
        val = '' if real_val is None else real_val
    except Exception:
        val = ''
        real_val = None
    # isinstance(Avoid,dict)也是true,会一直递归死
    if type(val) in (dict, list, str, tuple):
        val = type('Avoid', (type(val),), {'__getitem__': _sub_getitem, 'pop': _sub_pop})(val)
        # 重新赋值当前字典键为返回值,当对其赋值时可回溯
        if all([real_val is not None, isinstance(self, (dict, list)), type(k) is not slice]):
            self[k] = val
    return val


def _sub_pop(self, k=-1):
    try:
        val = self.__class__.mro()[-2].pop(self, k)
        val = '' if val is None else val
    except Exception:
        val = ''
    if type(val) in (dict, list, str, tuple):
        val = type('Avoid', (type(val),), {'__getitem__': _sub_getitem, 'pop': _sub_pop})(val)
    return val


class DefaultDict(dict):
    def __getitem__(self, k):
        return _sub_getitem(self, k)

    def pop(self, k):
        return _sub_pop(self, k)

In[8]: d=DefaultDict()
In[9]: d['a']['b']['c']['d']
Out[9]: ''
In[10]: d['a']="ggggggg"
In[11]: d['a']
Out[11]: 'ggggggg'
In[12]: d['a']['pp']
Out[12]: ''

No errors again. No matter how many levels nested. pop no error also

dd=DefaultDict({"1":333333})

ACE Fly
  • 305
  • 2
  • 8