Python--Finding Parent Keys for a specific value in a nested dictionary

Question

I am struggling to process a nested dictionary, and return the nested Parent Keys, for a specific Value, when the Value may exist more than once in the nested dictionary. For example:

example_dict = { 'key1' : 'value1',
                 'key2' : 'value2',
                 'key3' : { 'key3a': 'value3a' },
                 'key4' : { 'key4a': { 'key4aa': 'value4aa',
                                       'key4ab': 'value4ab',
                                       'key4ac': 'value1'},
                            'key4b': 'value4b'}
                }

You will notice that 'value1' appears twice in the above dictionary, and I would like to create a function that returns either a single list, or a series of lists, that identify the different Parent Keys, which in this case would be 'key1' and ('key4', 'key4a', key4ac).

This type of problem was dealt with elsewhere on this site, when the Value one was looking for only appeared once, and was readily handled by the following recursive function:

def find_key(d,key):
    for k,v in d.items():
        if isinstance(v,dict):
            p = find_key(v,key)
            if p:
                return [k] + p
        elif v == key:
            return [k]

print find_key(example_dict,'value4ac').

If you run the above code on the dictionary, I only get one answer for the parent keys. Any help would be much appreciated, Thanks!

Are you doing such searches repeatedly, or only doing one ever? If you're doing more than one, you'll almost certainly want to create a reverse-mapping dictionary, and just access that, instead of brute-force searching the entire dictionary each time. — abarnert, Sep 16 '13 at 04:14

score 12 · Accepted Answer · answered Sep 16 '13 at 04:29

Unless you're just doing a single search (or you're incredibly constrained on memory but have CPU time to burn…), you'll want to build a reverse-lookup dictionary, and then you can just use that.

To make that easier, I'm going to do it in two steps. First, convert a nested dictionary to a key-path dictionary:

def keypaths(nested):
    for key, value in nested.iteritems():
        if isinstance(value, collections.Mapping):
            for subkey, subvalue in keypaths(value):
                yield [key] + subkey, subvalue
        else:
            yield [key], value

Print out list(keypaths(example_dict)) if it isn't obvious what this does.

Now, how do you create a reverse dictionary? For a one-to-one-mapping, you can just do this:

reverse_dict = {value: keypath for keypath, value in keypaths(example_dict)}

But for a many-to-one mapping like yours, the reverse is one-to-many, so we need to map each value to a list of keys. So:

reverse_dict = {}
for keypath, value in keypaths(example_dict):
    reverse_dict.setdefault(value, []).append(keypath)

And now you don't need anything fancy; just do a normal dict lookup on reverse_dict:

>>> reverse_dict['value2']
[('key2',)]
>>> reverse_dict['value1']
[('key1',), ('key4', 'key4a', 'key4ac')]
>>> reverse_dict['value3']
KeyError: 'value3'

If you'd prefer the last one to return [] instead of raising a KeyError, you can use a defaultdict(list) instead of a plain dict, and then you don't need setdefault.

At any rate, the time taken to construct this reverse mapping is only a little longer than the time taken to do a single search by brute force, so if you're doing 100 searches, it'll be nearly 100x faster this way, as well as simpler.

This is great. Thnx for taking time to explain and put it together. The reverse mapping makes sense. My motivation behind my question was for processing json data that may have status: 'error' multiple times in the dictionary-style returned response, and then using the 'error' key path to identify bad data feed components, and also the nature of the error (i.e., the 'error message'). I wondered if json data comes down generally in a standardized format, such that there are standard serializers, or not in which case one must code based on the json format one is dealing with?--Cheers — Mike, Sep 17 '13 at 00:31
For Python 3.x, use `nested.items()` instead of `nested.iteritems()` as well as `import collections` as mentioned by @bshea for this to work — Si Mon, Nov 26 '18 at 21:12

score 7 · Answer 2 · answered Sep 16 '13 at 01:44

Here is one solution:

from copy import copy

example_dict = { 'key1' : 'value1',
                 'key2' : 'value2',
                 'key3' : { 'key3a': 'value3a' },
                 'key4' : { 'key4a': { 'key4aa': 'value4aa',
                                       'key4ab': 'value4ab',
                                       'key4ac': 'value1'},
                            'key4b': 'value4b'}
                }


result = []
path = []

def get_keys(d, target):
    for k, v in d.iteritems():
        path.append(k)
        if isinstance(v, dict):
            get_keys(v, target)
        if v == target:
            result.append(copy(path))
        path.pop()

Result:

>>> get_keys(example_dict, 'value1')
>>> result
[['key1'], ['key4', 'key4a', 'key4ac']]

Python--Finding Parent Keys for a specific value in a nested dictionary

2 Answers2

Linked