2

I want to parse a JSON file and get the full a list containing all the required paths to access keys. If we use the keys method we get a list of individual keys but not the full list of hierarchical keys needed to access the data.

So if given data as such

data = {
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

I could return a list like below containing all the full paths to the keys.

[['glossary']['title'],['glossary']['GlossDiv']...]

Reading and accessing elements is fine. to acheive the result I have tried to use this SO answer Access nested dictionary items via a list of keys

I don't really understand how this works and it returns only the word 'glossary'.

This is my code. I was using ChainMap as it made it easier to convert json to a dictionary and easily access keys.

import json
from collections import ChainMap
from functools import reduce
import operator

myDataChained = ChainMap(data)

def getFromDict(data):
    return reduce(operator.getitem, data)

Json_Paths = getFromDict(myDataChained)
print(Json_Paths)
sayth
  • 6,696
  • 12
  • 58
  • 100
  • Are you actually trying to do the reverse from the linked answer, i.e. to get a list of all possible 'paths'? – zwer Jul 23 '18 at 23:04
  • Yes. I thought I could use the answer to obtain all the possible paths. That way any json I get hit it with the script and now all full paths to access data. – sayth Jul 23 '18 at 23:15

1 Answers1

4

You cannot use the same technique to do the reverse as in the linked answer - you don't have a path information upfront to traverse through functools.reduce()/operator.getitem() combo - you're trying to obtain that information instead, i.e. to normalize/flatten your dictionary structure.

To do so, you'll have to iterate over the whole structure and collect all possible paths in your data, something like:

import collections

def get_paths(source):
    paths = []
    if isinstance(source, collections.MutableMapping):  # found a dict-like structure...
        for k, v in source.items():  # iterate over it; Python 2.x: source.iteritems()
            paths.append([k])  # add the current child path
            paths += [[k] + x for x in get_paths(v)]  # get sub-paths, extend with the current
    # else, check if a list-like structure, remove if you don't want list paths included
    elif isinstance(source, collections.Sequence) and not isinstance(source, str):
        #                          Python 2.x: use basestring instead of str ^
        for i, v in enumerate(source):
            paths.append([i])
            paths += [[i] + x for x in get_paths(v)]  # get sub-paths, extend with the current
    return paths

Now if you run your data through it:

data = {
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages...",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

paths = get_paths(data)

You'll get paths containing:

[['glossary'],
 ['glossary', 'title'],
 ['glossary', 'GlossDiv'],
 ['glossary', 'GlossDiv', 'title'],
 ['glossary', 'GlossDiv', 'GlossList'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'ID'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'SortAs'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossTerm'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'Acronym'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'Abbrev'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'para'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso', 0],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso', 1],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossSee']]

And you can feed any of those into that functools.reduce()/operator.getitem() combo to get the target value.

zwer
  • 24,943
  • 3
  • 48
  • 66