3

I want to convert each branch of a JSON tree structure into a list of items in that branch. I want to do it using loops but I can't access the objects using indices.

Example JSON:
{
    "Root": { "child1": "abc",
              "child2": "def",
              "child3": { "grandchild1": "nick",
                           "grandchild2": "Sam"
                        }
             }
 }

I want to traverse them and store them as following:

list1 = ['Root', "child1", "abc"]
list2 = ['Root', "child2", "def"]
list3 = ['Root', "child3", "grandchild1", "nick",]
list4 = ['Root', "child3", "grandchild2", "sam",]

I read the JSON as follows:

import json

with open('sample.json') as f:
    tree = json.load(f)

Problem: I wanted to loop through these items and append it to various lists but I can only access them through their keys like tree['Root'] would give Child1, 2, 3 and then tree['Root']['child3'] should give me the other two members. However, this method is not scalable in my use case where I have 1400 branches (pretty deep nested) in the JSON file and I want to create 1400 lists for them.

Any ideas how to do this efficiently?

utengr
  • 3,225
  • 3
  • 29
  • 68

1 Answers1

7

Using yield from statement from Python 3.3+ and a recursive function:

tree = {
"Root": { "Child1": "abc",
          "Child2": "def",
          "Child3": { "grandchild1": "nick",
                      "grandchild2": "Sam"
                    }
         }
}

def walk_json(tree, path=[]):
    try:
        for root, child in tree.items():
            yield from walk_json(child, path + [root])
    except AttributeError: # in case .items() is not possible (on leaves)
        yield path + [tree]

list(walk_json(tree))

will output:

[['Root', 'Child1', 'abc'],
['Root', 'Child2', 'def'],
['Root', 'Child3', 'grandchild1', 'nick'],
['Root', 'Child3', 'grandchild2', 'Sam']]
Guillaume
  • 5,497
  • 3
  • 24
  • 42
  • 2
    Good answer, but a bare `except:` can make troubleshooting difficult. Probably better to use `except AttributeError:`. – glibdud Oct 03 '17 at 13:58
  • @glibdud: very True, fixed – Guillaume Oct 03 '17 at 14:00
  • @Guillaume works fine. I accepted the answer but it would be great for others if you can add a basic explanation of the yield from statement. – utengr Oct 03 '17 at 14:06
  • yield is a keyword, similar to return, the difference is that yield is a return to the generator – Hejun Oct 03 '17 at 14:08
  • 1
    @engr_s: added link to the official doc, you can also see https://stackoverflow.com/questions/9708902/in-practice-what-are-the-main-uses-for-the-new-yield-from-syntax-in-python-3 – Guillaume Oct 03 '17 at 14:09
  • ```for i in walk_json(tree): print(i)``` gives OPs desired output ```['Root', 'Child1', 'abc'] ['Root', 'Child2', 'def'] ['Root', 'Child3', 'grandchild1', 'nick'] ['Root', 'Child3', 'grandchild2', 'Sam'] [Program finished]``` – Subham Mar 24 '21 at 07:47