4

I have a JSON object in Python represented as a nested lists of dictionaries. (Some of the values of the dictionary are dictionaries themselves, and so on.)

I want to be able to search for a key on all branches of this nested dictionary structure.
When I find the key I want to be able to return the full key path that leads to it.

For example: I'm looking for "special agents" who have a "special address key", but not all special agents have it, and those that do have it in inconsistent paths in their JSON.

So I search for key Special Address code. The result should return:

/'People'/'SpecialAgents'/'007'/'Special Address code'/  

So I will be able to reach its information in that way:

json_obj['People']['SpecialAgents']['007']['Special Address code']

Note that this is similar to this question but I need the full path to each instance of the key found.

Community
  • 1
  • 1
JavaSa
  • 5,813
  • 16
  • 71
  • 121
  • Sounds like you need a JSON equivalent of XPath. [This question](http://stackoverflow.com/questions/8481380/is-there-a-json-equivalent-of-xquery-xpath) has a few examples. – Daniel Roseman Jul 25 '15 at 10:17
  • what if there are multiple keys that are the same? – Padraic Cunningham Jul 25 '15 at 10:53
  • **See also:** https://stackoverflow.com/questions/7681301/search-for-a-key-in-a-nested-python-dictionary https://stackoverflow.com/a/16508328/42223 – dreftymac Oct 30 '17 at 19:56

2 Answers2

13

You need a recursive search.

You can define a function to deeply search in your input json:

def find_in_obj(obj, condition, path=None):

    if path is None:
        path = []    

    # In case this is a list
    if isinstance(obj, list):
        for index, value in enumerate(obj):
            new_path = list(path)
            new_path.append(index)
            for result in find_in_obj(value, condition, path=new_path):
                yield result 

    # In case this is a dictionary
    if isinstance(obj, dict):
        for key, value in obj.items():
            new_path = list(path)
            new_path.append(key)
            for result in find_in_obj(value, condition, path=new_path):
                yield result 

            if condition == key:
                new_path = list(path)
                new_path.append(key)
                yield new_path 

We can then use the example JSON in this similar SO question to test the recursive search:

In [15]: my_json = { "id" : "abcde",
   ....:   "key1" : "blah",
   ....:   "key2" : "blah blah",
   ....:   "nestedlist" : [ 
   ....:     { "id" : "qwerty",
   ....:       "nestednestedlist" : [ 
   ....:         { "id" : "xyz",
   ....:           "keyA" : "blah blah blah" },
   ....:         { "id" : "fghi",
   ....:           "keyZ" : "blah blah blah" }],
   ....:       "anothernestednestedlist" : [ 
   ....:         { "id" : "asdf",
   ....:           "keyQ" : "blah blah" },
   ....:         { "id" : "yuiop",
   ....:           "keyW" : "blah" }] } ] } 

Let's find every instance of the key 'id' and return the full path that gets us there:

In [16]: for item in find_in_obj(my_json, 'id'):
   ....:     print item
   ....:     
['nestedlist', 0, 'nestednestedlist', 0, 'id']
['nestedlist', 0, 'nestednestedlist', 1, 'id']
['nestedlist', 0, 'id']
['nestedlist', 0, 'anothernestednestedlist', 0, 'id']
['nestedlist', 0, 'anothernestednestedlist', 1, 'id']
['id']
Community
  • 1
  • 1
Jossef Harush Kadouri
  • 32,361
  • 10
  • 130
  • 129
  • +1 This works. Nice. I tested it searching for `id` in the object from [this question](http://stackoverflow.com/questions/9807634/find-all-occurences-of-a-key-in-nested-python-dictionaries-and-lists). Any comments on how it works, and where you got it from? (Did you write all this code from scratch?) – LondonRob Jul 25 '15 at 11:14
  • 1
    @LondonRob this function iterates keys and stores a state of the current path. it's based on yield return (which simplifies the solution). wrote it from scratch followed by OP description – Jossef Harush Kadouri Jul 25 '15 at 11:15
  • Mind if I edit this a bit to show a more complete example? It shows your work off better than your single result does currently. – LondonRob Jul 25 '15 at 11:17
  • @LondonRob feel free to improve this answer (and any other one on this site :P ) – Jossef Harush Kadouri Jul 25 '15 at 11:21
3

You need to search a tree. Here's the easiest way to do it.

It could be enhanced - for example, it is better to use None as default arg value instead of some object. Also, this is depth first search - you may want to get only one result, and that's when width first search is better (read about both those terms on wikipedia if you don't know them).

import json

example_json = """{
 "someList" : [
  {
   "x": {
    "y": {
     "z": "Some value"
    }
   }
  }, 
  {
   "x": {
    "y": {
     "a": "Wrong key"
    }
   }
  }
 ]
}
"""

struct = json.loads(example_json)

def find_all_with_key(wanted_key, tree, path=tuple()):
    if isinstance(tree, list):
        for idx, el in enumerate(tree):
            yield from find_all_with_key(wanted_key, el, path+(idx,))
    elif isinstance(tree, dict):
        for k in tree:
            if k == wanted_key:
                yield path +(k, )
        # you can add order of width-search by sorting result of tree.items()
        for k, v in tree.items(): 
            yield from find_all_with_key(wanted_key, v, path+(k,))

def retrieve(tree, path):
    for p in path:
        tree = tree[p]
    return tree

result = list(find_all_with_key("z", struct))
expected = [ ("someList", 0, "x", "y", "z") ]

assert result == expected
assert retrieve(struct, result[0]) == "Some value"
Filip Malczak
  • 3,124
  • 24
  • 44
  • this is python3 only syntax, you also don't need to call .keys to iterate over the dict keys – Padraic Cunningham Jul 25 '15 at 10:34
  • Told ya, this could use some enhancements :P Another ugly thing is usage of isinstance(), but that's not the point. Also, I think that "python3 only" isn't a bad thing - I'm a strong supporter of migrating to new line instead of using p2.7. – Filip Malczak Jul 25 '15 at 10:45
  • Well until pythons fairy godmother comes along and clicks her fingers to make all python code ever written python 3 compatible I guess python2 code will still have to be written. I don't see anything wrong with `isinstance()` at all that is necessary unlike keys ;) – Padraic Cunningham Jul 25 '15 at 10:55
  • Fixed. Anyway, only difference is that in p2 you should use for x in X: yield x instead of yield from X. More on it: http://stackoverflow.com/a/17581397/1219585 if anyone would be looking – Filip Malczak Jul 25 '15 at 13:33