12

I am struggling to process a nested dictionary, and return the nested Parent Keys, for a specific Value, when the Value may exist more than once in the nested dictionary. For example:

example_dict = { 'key1' : 'value1',
                 'key2' : 'value2',
                 'key3' : { 'key3a': 'value3a' },
                 'key4' : { 'key4a': { 'key4aa': 'value4aa',
                                       'key4ab': 'value4ab',
                                       'key4ac': 'value1'},
                            'key4b': 'value4b'}
                }

You will notice that 'value1' appears twice in the above dictionary, and I would like to create a function that returns either a single list, or a series of lists, that identify the different Parent Keys, which in this case would be 'key1' and ('key4', 'key4a', key4ac).

This type of problem was dealt with elsewhere on this site, when the Value one was looking for only appeared once, and was readily handled by the following recursive function:

def find_key(d,key):
    for k,v in d.items():
        if isinstance(v,dict):
            p = find_key(v,key)
            if p:
                return [k] + p
        elif v == key:
            return [k]

print find_key(example_dict,'value4ac').

If you run the above code on the dictionary, I only get one answer for the parent keys. Any help would be much appreciated, Thanks!

zero323
  • 322,348
  • 103
  • 959
  • 935
Mike
  • 133
  • 2
  • 5
  • Are you doing such searches repeatedly, or only doing one ever? If you're doing more than one, you'll almost certainly want to create a reverse-mapping dictionary, and just access that, instead of brute-force searching the entire dictionary each time. – abarnert Sep 16 '13 at 04:14

2 Answers2

12

Unless you're just doing a single search (or you're incredibly constrained on memory but have CPU time to burn…), you'll want to build a reverse-lookup dictionary, and then you can just use that.


To make that easier, I'm going to do it in two steps. First, convert a nested dictionary to a key-path dictionary:

def keypaths(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for subkey, subvalue in keypaths(value):
                yield [key] + subkey, subvalue
        else:
            yield [key], value

Print out list(keypaths(example_dict)) if it isn't obvious what this does.


Now, how do you create a reverse dictionary? For a one-to-one-mapping, you can just do this:

reverse_dict = {value: keypath for keypath, value in keypaths(example_dict)}

But for a many-to-one mapping like yours, the reverse is one-to-many, so we need to map each value to a list of keys. So:

reverse_dict = {}
for keypath, value in keypaths(example_dict):
    reverse_dict.setdefault(value, []).append(keypath)

And now you don't need anything fancy; just do a normal dict lookup on reverse_dict:

>>> reverse_dict['value2']
[('key2',)]
>>> reverse_dict['value1']
[('key1',), ('key4', 'key4a', 'key4ac')]
>>> reverse_dict['value3']
KeyError: 'value3'

If you'd prefer the last one to return [] instead of raising a KeyError, you can use a defaultdict(list) instead of a plain dict, and then you don't need setdefault.


At any rate, the time taken to construct this reverse mapping is only a little longer than the time taken to do a single search by brute force, so if you're doing 100 searches, it'll be nearly 100x faster this way, as well as simpler.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • This is great. Thnx for taking time to explain and put it together. The reverse mapping makes sense. My motivation behind my question was for processing json data that may have status: 'error' multiple times in the dictionary-style returned response, and then using the 'error' key path to identify bad data feed components, and also the nature of the error (i.e., the 'error message'). I wondered if json data comes down generally in a standardized format, such that there are standard serializers, or not in which case one must code based on the json format one is dealing with?--Cheers – Mike Sep 17 '13 at 00:31
  • It's simple and easy to understand. Thanks @abarnet – Md. Nazmul Haque Sarker Jun 23 '16 at 05:03
  • 2
    `import collections` – B. Shea Dec 21 '17 at 05:47
  • For Python 3.x, use `nested.items()` instead of `nested.iteritems()` as well as `import collections` as mentioned by @bshea for this to work – Si Mon Nov 26 '18 at 21:12
7

Here is one solution:

from copy import copy

example_dict = { 'key1' : 'value1',
                 'key2' : 'value2',
                 'key3' : { 'key3a': 'value3a' },
                 'key4' : { 'key4a': { 'key4aa': 'value4aa',
                                       'key4ab': 'value4ab',
                                       'key4ac': 'value1'},
                            'key4b': 'value4b'}
                }


result = []
path = []

def get_keys(d, target):
    for k, v in d.iteritems():
        path.append(k)
        if isinstance(v, dict):
            get_keys(v, target)
        if v == target:
            result.append(copy(path))
        path.pop()

Result:

>>> get_keys(example_dict, 'value1')
>>> result
[['key1'], ['key4', 'key4a', 'key4ac']]
Akavall
  • 82,592
  • 51
  • 207
  • 251