81

Importing from JSON can get very complex and nested structures. For example:

{u'body': [{u'declarations': [{u'id': {u'name': u'i',
                                       u'type': u'Identifier'},
                               u'init': {u'type': u'Literal', u'value': 2},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'},
           {u'declarations': [{u'id': {u'name': u'j',
                                       u'type': u'Identifier'},
                               u'init': {u'type': u'Literal', u'value': 4},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'},
           {u'declarations': [{u'id': {u'name': u'answer',
                                       u'type': u'Identifier'},
                               u'init': {u'left': {u'name': u'i',
                                                   u'type': u'Identifier'},
                                         u'operator': u'*',
                                         u'right': {u'name': u'j',
                                                    u'type': u'Identifier'},
                                         u'type': u'BinaryExpression'},
                               u'type': u'VariableDeclarator'}],
            u'kind': u'var',
            u'type': u'VariableDeclaration'}],
 u'type': u'Program'}

What is the recommended way to walk complex structures like the above?

Apart of a few list there are mostly dictionaries, the structure can become even more imbricated so I need a general solution.

martineau
  • 119,623
  • 25
  • 170
  • 301
Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179

10 Answers10

71

You can use a recursive generator for converting your dictionary to flat lists.

def dict_generator(indict, pre=None):
    pre = pre[:] if pre else []
    if isinstance(indict, dict):
        for key, value in indict.items():
            if isinstance(value, dict):
                for d in dict_generator(value, pre + [key]):
                    yield d
            elif isinstance(value, list) or isinstance(value, tuple):
                for v in value:
                    for d in dict_generator(v, pre + [key]):
                        yield d
            else:
                yield pre + [key, value]
    else:
        yield pre + [indict]

It returns

[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 2]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'i']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'type', u'Literal']
[u'init', u'declarations', u'body', u'value', 4]
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'j']
[u'body', u'type', u'VariableDeclaration']
[u'body', u'kind', u'var']
[u'init', u'declarations', u'body', u'operator', u'*']
[u'right', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'right', u'init', u'declarations', u'body', u'name', u'j']
[u'init', u'declarations', u'body', u'type', u'BinaryExpression']
[u'left', u'init', u'declarations', u'body', u'type', u'Identifier']
[u'left', u'init', u'declarations', u'body', u'name', u'i']
[u'declarations', u'body', u'type', u'VariableDeclarator']
[u'id', u'declarations', u'body', u'type', u'Identifier']
[u'id', u'declarations', u'body', u'name', u'answer']
[u'body', u'type', u'VariableDeclaration']
[u'type', u'Program']
funnydman
  • 9,083
  • 4
  • 40
  • 55
Valentin Briukhanov
  • 1,263
  • 9
  • 13
55

If you only need to walk the dictionary, I'd suggest using a recursive walk function that takes a dictionary and then recursively walks through its elements. Something like this:

def walk(node):
    for key, item in node.items():
        if item is a collection:
            walk(item)
        else:
            It is a leaf, do your thing

If you also want to search for elements, or query several elements that pass certain criteria, have a look at the jsonpath module.

Hans Then
  • 10,935
  • 3
  • 32
  • 51
  • 14
    This only works for directly nested dictionaries. In the example data structure, there were several dictionary values that were lists of other dictionaries. Some extra logic will be required to handle those (such as recursing inside a list comprehension or generator expression). To make things work right, you probably need to use your knowledge of the meaning of the data, such as the fact that all the dictionaries have a "type" key. – Blckknght Sep 20 '12 at 23:14
  • 1
    You should also be careful to check for cycles. e.g.: foo = {}; foo['bar'] = foo will blow you up. Your walk function should keep a list of, and not descend, objects it has seen. (you could pass the list to each successive call) – Cory Mar 25 '15 at 07:18
  • 1
    if item is a collection throws an error. what is your import statement – Golden Lion Feb 05 '21 at 18:26
10

Instead of writing your own parser, depending on the task, you could extend encoders and decoders from the standard library json module.

I recommend this especially if you need to encode objects belonging to custom classes into the json. If you have to do some operation which could be done also on a string representation of the json, consider also iterating JSONEncoder().iterencode

For both the reference is http://docs.python.org/2/library/json.html#encoders-and-decoders

danza
  • 11,511
  • 8
  • 40
  • 47
7

If you know the meaning of the data, you might want to create a parse function to turn the nested containers into a tree of objects of custom types. You'd then use methods of those custom objects to do whatever you need to do with the data.

For your example data structure, you might create Program, VariableDeclaration, VariableDeclarator, Identifier, Literal and BinaryExpression classes, then use something like this for your parser:

def parse(d):
    t = d[u"type"]

    if t == u"Program":
        body = [parse(block) for block in d[u"body"]]
        return Program(body)

    else if t == u"VariableDeclaration":
        kind = d[u"kind"]
        declarations = [parse(declaration) for declaration in d[u"declarations"]]
        return VariableDeclaration(kind, declarations)

    else if t == u"VariableDeclarator":
        id = parse(d[u"id"])
        init = parse(d[u"init"])
        return VariableDeclarator(id, init)

    else if t == u"Identifier":
        return Identifier(d[u"name"])

    else if t == u"Literal":
        return Literal(d[u"value"])

    else if t == u"BinaryExpression":
        operator = d[u"operator"]
        left = parse(d[u"left"])
        right = parse(d[u"right"])
        return BinaryExpression(operator, left, right)

    else:
        raise ValueError("Invalid data structure.")
Blckknght
  • 100,903
  • 11
  • 120
  • 169
6

Maybe can help:

def walk(d):
    global path
      for k,v in d.items():
          if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
            path.append(k)
            print "{}={}".format(".".join(path), v)
            path.pop()
          elif v is None:
            path.append(k)
            ## do something special
            path.pop()
          elif isinstance(v, dict):
            path.append(k)
            walk(v)
            path.pop()
          else:
            print "###Type {} not recognized: {}.{}={}".format(type(v), ".".join(path),k, v)

mydict = {'Other': {'Stuff': {'Here': {'Key': 'Value'}}}, 'root1': {'address': {'country': 'Brazil', 'city': 'Sao', 'x': 'Pinheiros'}, 'surname': 'Fabiano', 'name': 'Silos', 'height': 1.9}, 'root2': {'address': {'country': 'Brazil', 'detail': {'neighbourhood': 'Central'}, 'city': 'Recife'}, 'surname': 'My', 'name': 'Friend', 'height': 1.78}}

path = []
walk(mydict)

Will produce output like this:

Other.Stuff.Here.Key=Value 
root1.height=1.9 
root1.surname=Fabiano 
root1.name=Silos 
root1.address.country=Brazil 
root1.address.x=Pinheiros 
root1.address.city=Sao 
root2.height=1.78 
root2.surname=My 
root2.name=Friend 
root2.address.country=Brazil 
root2.address.detail.neighbourhood=Central 
root2.address.city=Recife 
Fabiano Silos
  • 71
  • 1
  • 3
6

To walk/map over an entire JSON structure, you can use the following code:

def walk(node, key):
    if type(node) is dict:
        return {k: walk(v, k) for k, v in node.items()}
    elif type(node) is list:
        return [walk(x, key) for x in node]
    else:
        return YourFunction(node, key)

def YourFunction(node, key):
    if key == "yourTargetField":   # for example, you want to modify yourTargetField
        return "Modified Value"
    return node # return existing value

This will go through your entire json structure and run every leaf (endpoint key-value pair) through your function. Yourfunction returns the modified value. The entire walk function will give you a new, processed object, in order.

4

If the accepted answer works for you, but you'd also like a full, ordered path with the numerical index of the nested arrays included, this slight variation will work:

def dict_generator(indict, pre=None):
    pre = pre[:] if pre else []
    if isinstance(indict, dict):
        for key, value in indict.items():
            if isinstance(value, dict):
                for d in dict_generator(value,  pre + [key]):
                    yield d
            elif isinstance(value, list) or isinstance(value, tuple):
                for k,v in enumerate(value):
                    for d in dict_generator(v, pre + [key] + [k]):
                        yield d
            else:
                yield pre + [key, value]
    else:
        yield indict
Brideau
  • 4,564
  • 4
  • 24
  • 33
4

My more flexible version is as follows:

def walktree(tree, at=lambda node: not isinstance(node, dict), prefix=(), 
                flattennode=lambda node:isinstance(node, (list, tuple, set))):
    """
    Traverse a tree, and return a iterator of the paths from the root nodes to the leaf nodes.
    tree: like '{'a':{'b':1,'c':2}}'
    at: a bool function or a int indicates levels num to go down. walktree(tree, at=1) equivalent to tree.items()
    flattennode: a bool function to decide whether to iterate at node value
    """
    if isinstance(at, int):
        isleaf_ = at == 0
        isleaf = lambda v: isleaf_
        at = at - 1
    else:
        isleaf = at
    if isleaf(tree):
        if not flattennode(tree):
            yield (*prefix, tree)
        else:
            for v in tree:
                yield from walktree(v, at, prefix, flattennode=flattennode)
    else:
        for k,v in tree.items():
            yield from walktree(v, at, (*prefix, k), flattennode=flattennode)

usage:

> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}))
[('a', 'b', 0), ('a', 'b', 1), ('a', 'c', 2), ('d', 3)]
> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}, flattennode=lambda e:False))
[('a', 'b', [0, 1]), ('a', 'c', 2), ('d', 3)]
> list(walktree({'a':{'b':[0,1],'c':2}, 'd':3}, at=1))
[('a', {'b': [0, 1], 'c': 2}), ('d', 3)]
> list(walktree({'a':{'b':[0,{1:9,9:1}],'c':2}, 'd':3}))
[('a', 'b', 0), ('a', 'b', 1, 9), ('a', 'b', 9, 1), ('a', 'c', 2), ('d', 3)]
> list(walktree([1,2,3,[4,5]]))
[(1,), (2,), (3,), (4,), (5,)]
guoyongzhi
  • 121
  • 1
  • 3
3

Some addition to solution above (to handle json including lists)

#!/usr/bin/env python

import json

def walk(d):
   global path
   for k,v in d.items():
      if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
         path.append(k)
         print("{}={}".format(".".join(path), v)) 
         path.pop()
      elif v is None:
         path.append(k)
         # do something special
         path.pop()
      elif isinstance(v, list):
         path.append(k)
         for v_int in v:
            walk(v_int)
         path.pop()
      elif isinstance(v, dict):
         path.append(k)
         walk(v)
         path.pop()
      else:
         print("###Type {} not recognized: {}.{}={}".format(type(v), ".".join(path),k, v))

with open('abc.json') as f:
   myjson = json.load(f)

path = []
walk(myjson)
BurAk
  • 31
  • 2
3

A bit improved version of the accepted solution. If you're using Python3 (I hope you do) you can use yield from syntax that was introduced in Python 3.3. Also, isinstance as a second argument can accept tuple:

def dict_generator(adict, pre=None):
    pre = pre[:] if pre else []
    if isinstance(adict, dict):
        for key, value in adict.items():
            if isinstance(value, dict):
                yield from dict_generator(value, pre + [key])
            elif isinstance(value, (list, tuple)):
                for v in value:
                    yield from dict_generator(v, pre + [key])
            else:
                yield pre + [key, value]
    else:
        yield pre + [adict]

Iterative version (pretty similar if you're using generators):

def flatten(adict):
    stack = [[adict, []]]
    while stack:
        d, pre = stack.pop()
        if isinstance(d, dict):
            for key, value in d.items():
                if isinstance(value, dict):
                    stack.append([value, pre + [key]])
                elif isinstance(value, (list, tuple)):
                    for v in value:
                        stack.append([v, pre + [key]])
                else:
                    print(pre + [key, value])
        else:
            print(pre + [d])
funnydman
  • 9,083
  • 4
  • 40
  • 55