-1

Giving data organized in JSON format (code example bellow) how can we get the path of keys and sub-keys associated with a given value?

i.e.

Giving an input "23314" we need to return a list with: Fanerozoico, Cenozoico, Quaternario, Pleistocenico, Superior.

Since data is a json file, using python and json lib we had decoded it:

import json

def decode_crono(crono_file):
    with open(crono_file) as json_file:
        data = json.load(json_file)

Now on we do not know how to treat it in a way to get what we need. We can access keys like this:

k = data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "].keys()

or values like this:

v= data["Fanerozoico"]["Cenozoico"]["Quaternario "]["Pleistocenico "]["Superior"].values()

but this is still far from what we need.

{
"Fanerozoico": {
    "id": "20000",
    "Cenozoico": {
        "id": "23000",
        "Quaternario": {
            "id": "23300",
            "Pleistocenico": {
                "id": "23310",
                "Superior": {
                    "id": "23314"
                },
                "Medio": {
                    "id": "23313"
                },
                "Calabriano": {
                    "id": "23312"
                },
                "Gelasiano": {
                    "id": "23311"
                }
            }
        }
    }
}
}
AbreuFreire
  • 100
  • 2
  • 8
  • 2
    You'll get considerably better reception here on StackOverlfow if you show us your attempts at solving the problem, rather than just asking us to do your work for you. – g.d.d.c Jul 07 '16 at 16:42
  • Adding to the gentle hint of @g.d.d.c two questions - while we await the prior art - 1) with "all" keys in combination with a table you mean an inversion of the structure where the keys are then the "cases" (values of id fields) and the values are the paths from the root by giving all keys of JSON needed to navigate? 2) what exactly is "a table as output with 1X5" i.e. what means 1X5? Guessing at the latter answer: You mean 1 times 5 alluding to only seeking id values of depth 5 (so **not** "20000" or "23000" in the sample? Thanks – Dilettant Jul 07 '16 at 16:50
  • 1
    @Dilettant thanks, edited the question to clarify what we need. 1) based on this JSON structure keys are like nodes identified by the code of the id field; if you use ["key1"].keys() you get ["key1a" "key1b" "key1c"]; as far as I understand we do not want a physical inversion of the structure. 2) after get as output those keys we want to write them to a 1 line x 5 columns table.dbf – AbreuFreire Jul 07 '16 at 18:16

2 Answers2

2

It's a little hard to understand exactly what you are after here, but it seems like for some reason you have a bunch of nested json and you want to search it for an id and return a list that represents the path down the json nesting. If so, the quick and easy path is to recurse on the dictionary (that you got from json.load) and collect the keys as you go. When you find an 'id' key that matches the id you are searching for you are done. Here is some code that does that:

def all_keys(search_dict, key_id):
    def _all_keys(search_dict, key_id, keys=None):
        if not keys:
            keys = []
        for i in search_dict:
            if search_dict[i] == key_id:
                return keys + [i]
            if isinstance(search_dict[i], dict):
                potential_keys = _all_keys(search_dict[i], key_id, keys + [i])
                if 'id' in potential_keys:
                    keys = potential_keys
                    break
        return keys
    return _all_keys(search_dict, key_id)[:-1]

The reason for the nested function is to strip off the 'id' key that would otherwise be on the end of the list.

This is really just to give you an idea of what a solution might look like. Beware the python recursion limit!

Community
  • 1
  • 1
Stephen
  • 2,613
  • 1
  • 24
  • 42
0

Based on the assumption that you need the full dictionary path until a key named id has a particular value, here's a recursive solution that iterates the whole dict. Bear in mind that:

  • The code is not optimized at all
  • For huge json objects it might yield StackOverflow :)
  • It will stop at first encountered value found (in theory there shouldn't be more than 1 if the json is semantically correct)

The code:

import json
from types import DictType

SEARCH_KEY_NAME = "id"
FOUND_FLAG = ()
CRONO_FILE = "a.jsn"


def decode_crono(crono_file):
    with open(crono_file) as json_file:
        return json.load(json_file)


def traverse_dict(dict_obj, value):
    for key in dict_obj:
        key_obj = dict_obj[key]
        if key == SEARCH_KEY_NAME and key_obj == value:
            return FOUND_FLAG
        elif isinstance(key_obj, DictType):
            inner = traverse_dict(key_obj, value)
            if inner is not None:
                return (key,) + inner
    return None


if __name__ == "__main__":
    value = "23314"
    json_dict = decode_crono(CRONO_FILE)
    result = traverse_dict(json_dict, value)
    print result
CristiFati
  • 38,250
  • 9
  • 50
  • 87