9

How do I create a dict of dict of lists using defaultdict? I am getting the following error.

>>> from collections import defaultdict
>>> a=defaultdict()
>>> a["testkey"]=None
>>> a
defaultdict(None, {'testkey': None})
>>> a["testkey"]["list"]=[]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object does not support item assignment
file2cable
  • 564
  • 6
  • 15

3 Answers3

21

It's a little tricky. You make a defaultdict of defaultdicts, like so:

defaultdict(lambda: defaultdict(list))
wim
  • 338,267
  • 99
  • 616
  • 750
  • 2
    This is perfectly fine (up-voted), but since I have a pathology about avoiding `lambda` functions whenever possible (they're abused regularly in cases where there are always better options, e.g. using `map`/`filter` with `lambda` is _always_ slower than a listcomp/genexpr that inlines the `lambda` body; I react to said abuse by resolving to avoid them whenever possible), this approach uses the bound method of an empty `defaultdict(list)` to avoid the `lambda` and push all the work to the C layer, which (trivially, 5-10% faster) improves runtime speed: `defaultdict(defaultdict(list).copy)` – ShadowRanger Mar 02 '16 at 18:40
  • 1
    Hey, cool idea! I hate lambda's too. Feel free to edit it into the answer, or add your own answer. – wim Mar 02 '16 at 19:12
  • @wim: [Sure, why not](http://stackoverflow.com/a/35759455/364696)? Inflict my OCD at top-level. :-) – ShadowRanger Mar 02 '16 at 22:29
  • 1
    +1 for you. maybe you'll enjoy my lambda-hating question [here](http://stackoverflow.com/q/13219320/674039) – wim Mar 02 '16 at 23:23
12

Slightly faster than using a lambda:

defaultdict(defaultdict(list).copy)

This has the same observable behavior as wim's answer, but avoids a lambda in favor of a (in CPython) bound built-in method implemented in C, which means default value generation doesn't have to execute any Python byte code or look up any names, and it runs a small amount faster. In microbenchmarks on CPython 3.5, it looks like the cost paid when a key did not exist at access time is about 5-10% lower this way than with the otherwise equivalent lambda.

Really, the reason I prefer it is because I hate lambda due to people overusing it when it's a bad idea (e.g. map/filter with lambda is always more verbose and slower than an equivalent listcomp/genexpr, but people keep doing it anyway for no discernible reason), even though in this case it hardly matters.


Update: As of 3.8, this performance improvement is gone, and the lambda is faster (~3% reduced runtime using lambda on 3.8, ~7% on 3.9), for simple microbenchmarks with ipython. If you wish to reproduce my tests, I tested:

>>> from collections import defaultdict
>>> %%timeit dd = defaultdict(lambda: defaultdict(list)); o = object
... dd[o()]

>>> %%timeit dd = defaultdict(defaultdict(list).copy); o = object
... dd[o()]

where caching o = object minimized lookup expenses and allowed us to make extremely cheap, guaranteed unique keys that we accessed (forcing auto-vivification of a list) while doing no other work.

The performance improvement in 3.8 is likely largely due to the introduction of per opcode cache for the LOAD_GLOBAL instruction, reducing the cost of looking up defaultdict and list within the lambda from a full dict lookup (two in the case of list, in built-ins) to a quick check of the version tag on the dict followed by a cheap load from the cache, reducing the cost by ~40%. The 3.9 improvement likely (not sure on this) relates to CPython's internals moving to optimize and favor vectorcall code paths more, at the expense of non-vectorcall code paths (which the defaultdict(list).copy path uses more of, relatively speaking), and even before these improvements, defaultdict(list).copy had some inefficiencies that the lambda lacked, providing some margin for improving on it.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Hi, could you please show your benchmarks? I could not reproduce the speed increase – wim Apr 20 '16 at 19:05
  • @wim: It's just an ipython microbenchmark that intentionally hammers the "auto-vivify" case, and like I said, even then, the gain is small. Just two lines `%%timeit -r5 dd = defaultdict(defaultdict(list).copy); k = itertools.count(10000).__next__`, with the "body" being just `dd[k()]`. On my machine, Linux x86-64, Python 3.5, the cost per loop is 504 ns; using `lambda: defaultdict(list)` instead is 559 ns. Like I said, not a big deal. My main motivation (such as it is, I don't actually think the `lambda` is bad here) is `lambda` hate (and hey, it's shorter too!), not micro-optimizations. :-) – ShadowRanger Apr 20 '16 at 19:50
  • You may want to qualify the performance gain mentioned in your answer by mentioning the python version. – wim Apr 20 '16 at 20:39
  • As much as I want to like this answer, the performance gains appear to be gone. The lambda seems to be reliably about 10% faster (CPython 3.9 on Linux). – wim Jan 12 '21 at 18:11
  • @wim: Yup, can confirm. At a guess (educated, from code inspection) the changes are: 1) 3.8 added a per-opcode cache for `LOAD_GLOBAL` that dramatically reduced the lookup cost to find `defaultdict`/`list` for the `lambda`, and 2) The code path `defaultdict.copy` uses to copy does basically the same thing as the `lambda` (slightly cheaper than even cached `LOAD_GLOBAL`, but not by as much), but uses a C varargs based function invocation technique (`PyObject_CallFunctionObjArgs`) that is more general (and slower) than the more optimized path the bytecode interpreter loop uses. – ShadowRanger Jan 12 '21 at 18:48
  • 3) The `defaultdict.copy` code path passes along a second argument to `defaultdict` (it passes the existing `default_factory` followed by `self`); even though `self` is empty, `dict.__init__` has to process it (doing nothing, but calling extra functions). The `lambda` doesn't pass that second arg, so some additional thumb-twiddling is avoided. In some future version of Python, `defaultdict(list).copy` might win again (e.g. a trivial change to `new_defdict` could make it avoid a varargs code path, recognizing a copy of an empty `dict` might get cheaper), but right now, `lambda` is faster. – ShadowRanger Jan 12 '21 at 18:54
  • @wim: I've updated the answer to include this information. I wouldn't really rely on it either way, and the improvement is trivial either way, so my original point (I hate `lambda`s because they're overused) still holds, but yes, there is no longer any performance advantage to them (not that I claimed the performance advantage was important in the first place). I did add my microbenchmark to the answer as well, so people can confirm for themselves on their own for arbitrary versions. – ShadowRanger Jan 12 '21 at 19:12
2

You may have to do like this.

>>> from collections import defaultdict
>>> a=defaultdict()
>>> a["testkey"]=None
>>> a["testkey"]=defaultdict(list)
>>> a["testkey"]["list"]=["a","b","c"]
>>> a
defaultdict(None, {'testkey': defaultdict(<type 'list'>, {'list': ['a', 'b', 'c']})})
nohup
  • 3,105
  • 3
  • 27
  • 52
  • 3
    No, you don't, you can pass a custom function to the first `defaultdict`. – jonrsharpe Oct 12 '15 at 12:17
  • Thank you @jonrsharpe, but I tried on my console and it worked. Is that a wrong way to do this? – nohup Oct 12 '15 at 12:19
  • 2
    It will work, but it's a bit awkward to mix `defaultdict` and explicit value setting like that. Also the `a["testkey"]=None` is completely redundant. – jonrsharpe Oct 12 '15 at 12:20
  • I see. So I guess a lambda function inside a `defaultdict()` would be the best way? – nohup Oct 12 '15 at 12:26
  • Thank you @nohup for the answer, +1 for making it work, but I prefer to use it the functional way using a lambda function. – file2cable Oct 12 '15 at 12:31