1

I have a generator that iterates over all combinations of keys up to a particular depth in a nested dictionary:

def iter_dict(levels, input_dict, items=[], sort=False, **sort_args):
    for dict_key, val in (sorted(input_dict.items(), **sort_args) if
                          sort else input_dict.items()):
        if levels == 1:
            yield items + [(dict_key, val)]
        else:
            yield from iter_dict(levels - 1, val, items + [(dict_key, val)])

So it acts like so:

>>> d = {'a': 1, 'b': 2}
>>> list(iter_dict(1, d))
[[('a', 1)], [('b', 2)]]

And

>>> d = {'a': {'c': 1}, 'b': {'d' : 2}}
>>> list(iter_dict(1, d))
[[('a', {'c': 1})], [('b', {'d': 2})]]
>>> list(iter_dict(2, d))
[[('a', {'c': 1}), ('c', 1)], [('b', {'d': 2}), ('d', 2)]]

Each iteration on the generator returns a list of tuples, with the nth tuple being (key, value) at a depth of n into the nested dictionary.

But I am implementing this function on huge dictionaries and am worried about reaching the maximum recursion depth level.

How can I rewrite the generator to remove the recursion?

texasflood
  • 1,571
  • 1
  • 13
  • 22
  • 1
    I don’t really understand what the output structure is supposed to mean (or what it could be used for)… – poke Oct 19 '15 at 10:40
  • It is useful for looping over all the key value pairs in a nested dictionary, up to a specified depth. Each iteration on the generator returns a list of tuples, with the nth tuple being `(key, value)` at a depth of n into the nested dictionary – texasflood Oct 19 '15 at 10:41
  • Do you expect to have dicts with 1000+ levels of nesting? Anyway, guess this can be done using a stack and storing sequences of keys on the stack (all the keys to the current sub-dict), but I'm not sure if it's worth the effort. – tobias_k Oct 19 '15 at 10:44
  • @tobias_k No, in fact there are maybe only 6 levels. But each level can contain thousands of keys, so the `yield from` line can be hit thousands of times. To clarify, I haven't actually hit the max recursion depth yet, but am worried I will when I implement this on large datasets. – texasflood Oct 19 '15 at 10:46
  • 1
    But this should not give you a max recursion _depth_ error, right? – tobias_k Oct 19 '15 at 10:47
  • 2
    Note that you're in danger of falling foul of [this classic Python gotcha](http://stackoverflow.com/q/1132941/3001761). As @tobias_k has pointed out, it seems unlikely that recursion depth will be a problem here. – jonrsharpe Oct 19 '15 at 10:47
  • @tobias_k Ah I see... I didn't realise that! Everything may be fine after all then – texasflood Oct 19 '15 at 10:49
  • @jonrsharpe I guess you referring to `items`? I realised this, but I don't think it's a problem currently because I never modify `items`, I creating a new list object every time with the `+` operator. – texasflood Oct 19 '15 at 10:55
  • Indeed, the default value for `items` is never used and an explicitly different object is always passed in each recursive call. So that’s not the problem here. – poke Oct 19 '15 at 11:09

1 Answers1

1

But I am implementing this function on huge dictionaries and am worried about reaching the maximum recursion depth level

Unless your dictionaries actually have 1000+ levels of nesting, this should not be a problem. Maximum recursion depth is really only about the depth; the branching factor is not an issue. That is, it can be quite an issue w.r.t. running time, but you won't get a maximum recursion depth error from that (and the running time will be just the same without recursion).

How can I rewrite the generator to remove the recursion?

I guess this can be done using a stack and storing sequences of keys on the stack (all the keys to the current sub-dict). Such a solution will probably be a bit more involved and not as elegant as the recursive algorith, so given the above, I don't think it's worth the effort.

But whatever, here you go (slightly simplified, without the sorting):

from functools import reduce
def iter_dict(levels, input_dict):
    def get_nested(keys):
        return reduce(lambda d, k: d[k], keys, input_dict)
    stack = [[k] for k in input_dict]
    while stack:
        keys = stack.pop()
        if len(keys) == levels:
            yield [(k, get_nested(keys[:i])) for i, k in enumerate(keys, 1)]
        else:
            stack.extend(keys + [k] for k in get_nested(keys))

Example:

>>> d = {'a': {'c': 1, "e": 2}, 'b': {'d' : 3, "f": 4}}
>>> list(iter_dict(2, d))
[[('a', {'c': 1, 'e': 2}), ('e', 2)],
 [('a', {'c': 1, 'e': 2}), ('c', 1)],
 [('b', {'d': 3, 'f': 4}), ('f', 4)],
 [('b', {'d': 3, 'f': 4}), ('d', 3)]]
tobias_k
  • 81,265
  • 12
  • 120
  • 179