How to convert defaultdict of defaultdicts [of defaultdicts] to dict of dicts [of dicts]?

Question

Using this answer, I created a defaultdict of defaultdicts. Now, I'd like to turn that deeply nested dict object back into an ordinary python dict.

from collections import defaultdict

factory = lambda: defaultdict(factory)
defdict = factory()
defdict['one']['two']['three']['four'] = 5

# defaultdict(<function <lambda> at 0x10886f0c8>, {
#             'one': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                 'two': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                     'three': defaultdict(<function <lambda> at 0x10886f0c8>, {
#                         'four': 5})})})})

I assume this is not the right solution:

import json

regdict = json.loads(json.dumps(defdict))

# {u'one': {u'two': {u'three': {u'four': 5}}}}

Also, this answer is inadequate since it does not recurse on the nested dict(s).

http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem — Ignacio Vazquez-Abrams, Oct 21 '14 at 21:41
If you don't want a `defaultdict` in the end, have you considered writing a wrapper that generates things by using `setdefault` on ordinary `dict`s? Making your code slightly more complicated at construction time may be a better solution than making it slightly simpler and then adding an extra complication afterward… — abarnert, Oct 21 '14 at 21:50
Anyway, the `json` solution obviously won't work at all if you can have anything besides strings, numbers (where you don't care about the type of the numbers), bools, `None`, lists, and dicts… but you _could_ do something workable with YAML or pickle if you really wanted to. I wouldn't recommend it, but it's worth knowing why you don't want it. — abarnert, Oct 21 '14 at 21:52
Good question Martin. I am pickling the resulting dictionaries, and I was getting `PicklingError: Can't pickle ...` I probably would have asked something more specific, like how to solve my initial problem, but I was very curious to know the answer to this specific one, thinking there may have been some clever work with reduce() or something in itertools. — samstav, Oct 21 '14 at 21:53
@samstav: "How do I pickle a recursive `defaultdict` (and I don't care whether I get back `defaultdict` or `dict`)" is a good question; why didn't you ask that in the first place? — abarnert, Oct 21 '14 at 21:54
@abarnert The only answer I have to that is the basic response from above.... I was curious about this problem, thinking there may have been a clever solution using reduce() or itertools or something. I knew that I could have asked about the pickling... but I kinda got stuck on this. I have been digging into these parts of python lately and wanted to see if I could learn something new. tl;dr Because I'm a bonehead — samstav, Oct 21 '14 at 22:01
@samstav: The point is this is a clumsy and inefficient thing to do (Martijn's answer is as clean as it's going to get), and if you can instead not do it (either by not creating a `defaultdict` in the first place, or by converting on the fly at pickling time instead of converting the whole thing in-memory, or by just getting the `defaultdict` to pickle so the issue never arises), that's a better solution to almost any problem, and definitely to your specific one. — abarnert, Oct 21 '14 at 22:10
I see this, but only now that I've seen the answers. That's why I asked the question though... very often I think there's no efficient/correct way to solve a particular problem, so I need to backtrack until the problem in its current form "goes away". However, also quite often, I come across answers I never thought possible, using features of the language or other smarts that I simply didn't know about. I asked thinking there was something like that -- that I just didn't know about yet. — samstav, Oct 21 '14 at 22:20

score 55 · Accepted Answer · edited Feb 25 '18 at 18:47

55

You can recurse over the tree, replacing each defaultdict instance with a dict produced by a dict comprehension:

def default_to_regular(d):
    if isinstance(d, defaultdict):
        d = {k: default_to_regular(v) for k, v in d.items()}
    return d

Demo:

>>> from collections import defaultdict
>>> factory = lambda: defaultdict(factory)
>>> defdict = factory()
>>> defdict['one']['two']['three']['four'] = 5
>>> defdict
defaultdict(<function <lambda> at 0x103098ed8>, {'one': defaultdict(<function <lambda> at 0x103098ed8>, {'two': defaultdict(<function <lambda> at 0x103098ed8>, {'three': defaultdict(<function <lambda> at 0x103098ed8>, {'four': 5})})})})
>>> default_to_regular(defdict)
{'one': {'two': {'three': {'four': 5}}}}

edited Feb 25 '18 at 18:47

duhaime

25,611
17
169
224

answered Oct 21 '14 at 21:43

Martijn Pieters

1,048,767
296
4,058
3,343

This is a great solution but will not work on very large dictionaries due to the recursion limit. – Tennessee Leeuwenburg Oct 21 '14 at 21:47
2

@TennesseeLeeuwenburg: Only if by "very large" you mean "takes up half your memory" or "depth approaching 1000", neither of which is that common of a problem. At any rate, you can solve the first by changing this from top-down to bottom-up recursion, and the second by using a loop over an explicit stack instead of recursion, but either one is going to make your code lot more complicated for a problem that you usually aren't going to have… – abarnert Oct 21 '14 at 21:49
@TennesseeLeeuwenburg: how many dictionaries of depth > 1000 do you think you can create before running into memory issues? Note that you won't run into this issue with *wide* nesting, only with *deep* nesting. – Martijn Pieters Oct 21 '14 at 21:50
@MartijnPieters: Just to play devil's advocate: `d = {}; d[None] = d` fits in under 1KB, and `inf > 1000`… If that's cheating, a dict that's just a flat tree of depth 1001 (`d[None][None][None]…[None]=None`) should be a few hundred KB at worst. – abarnert Oct 21 '14 at 22:13
@abarnert: sure, but real-world nested structures are not that deep. And if they are they are wide too. – Martijn Pieters Oct 21 '14 at 22:14
@MartijnPieters: Yeah, I know, it's just that the way you phrased your question, well, I think I can create a whole lot of them. :) – abarnert Oct 21 '14 at 22:22
@MartijnPieters I'm just saying. I've hit recursion limits before, it can happen. There are plenty of software problems where the available data doesn't fit on a single disk, let alone into memory. It's a less-common case, which is why I prefixed by saying your solution was a great solution. – Tennessee Leeuwenburg Oct 22 '14 at 23:56
3

Replace `iteritems` by `items` for python3 – Yohan Obadia Jul 05 '17 at 11:36
1

If you have nested combination of both of them I mean dict to defaultdicts of dicts or of any order this approach won't work. You may need to change if condition to `if isinstance(d, defaultdict) or isinstance(d, dict)` – skt7 Jun 18 '18 at 09:57
1

@ShivamKThakkar: then just use `if isinstance(d, dict):`; `defaultdict` is a subclass of `dict` and passes that instance test just as much. And `isinstance()` takes a tuple for the second argument, if you need to test for multiple types where those types are not a subclass of another of the types, you'd use `isinstance(instance, (type1, type2, ...))`. – Martijn Pieters Jun 18 '18 at 11:18
is there anyway we can use for example d.default_factory = None, which will convert the defaultdict to dict, but for every defaultdict? – coredumped0x Dec 06 '21 at 18:40
@MurphyAdam: no. – Martijn Pieters Jan 19 '22 at 19:11

score 7 · Answer 2 · edited Mar 20 '17 at 10:29

What you're actually trying to do is pickle your recursive defaultdict. And you don't care whether you get back a dict or a defaultdict when unpickling.

While there are a number of ways to solve this (e.g., create a defaultdict subclass with its own pickling, or explicitly override the default one with copyreg), there's one that's dead trivial.

Notice the error you get when you try it:

>>> pickle.dumps(defdict)
PicklingError: Can't pickle <function <lambda> at 0x10d7f4c80>: attribute lookup <lambda> on __main__ failed

You can't pickle lambda-defined functions, because they're anonymous, meaning there's no way they could ever be unpickled.

But there is literally no reason this function needs to be defined by lambda. In particular, you don't even want it to be anonymous, because you're explicitly giving it a name. So:

def factory(): return defaultdict(factory)

And you're done.

Here it is in action:

>>> from collections import defaultdict
>>> def factory(): return defaultdict(factory)
>>> defdict = factory()
>>> defdict['one']['two']['three']['four'] = 5
>>> import pickle
>>> pickle.dumps(defdict)
b'\x80\x03ccollections\ndefaultdict\nq\x00c__main__\nfactory\nq\x01\x85q\x02Rq\x03X\x03\x00\x00\x00oneq\x04h\x00h\x01\x85q\x05Rq\x06X\x03\x00\x00\x00twoq\x07h\x00h\x01\x85q\x08Rq\tX\x05\x00\x00\x00threeq\nh\x00h\x01\x85q\x0bRq\x0cX\x04\x00\x00\x00fourq\rK\x05ssss.'

There are other cases where using lambda instead of def for no good reason will cause problems—you can't introspect your functions as well at runtime, you get worse tracebacks in the debugger, etc. Use lambda when you want an inherently-anonymous function, or a function you can define in the middle of an expression, but don't use it to save three characters of typing.

Will the pickled `defaultdict` load without issues in another script where `factory` is not defined? — bli, Jul 15 '20 at 15:15
I just tested (`with open("d.pickle", "wb") as fh: dump(defdict, fh)` in one script and `with open("d.pickle", "rb") as fh: defdict = load(fh)` in another one), and this results in an `AttributeError: Can't get attribute 'factory' on `. Converting to plain regular dict as in https://stackoverflow.com/a/26496899/1878788 helps. — bli, Jul 16 '20 at 08:15

How to convert defaultdict of defaultdicts [of defaultdicts] to dict of dicts [of dicts]?

2 Answers2

Linked