1

I have a very heavily nested json file with multiple blocks inside it. The following is an excerpt of the file, It has more than 6 levels of nesting like that


{
"title": "main questions",
 "type": "static",
"value":
    {
    "title": "state your name",
    "type": "QUESTION",
    "locator": "namelocator",
    }
}

If anyone can please help me to parse this in a way such that, i can find the title and locator when type = question(because the type may vary across different parts of the file) and that too concurrently(sequential would kill the system considering the scale of the file)

I have been using the following code to get the values of title and locator separately pip install jsonpath(in anaconda terminal)

from jsonpath import JSONPath
import json as js
data = js.load(f)# f is the path to .json file
JSONPath('$.[?(@.type== "QUESTION")].locator').parse(data)
JSONPath('$.[?(@.type== "QUESTION")].title').parse(data)

The problem is: I am getting the list of locators and title, but its all jumbled since there is no way to know the sequence the function parses the file in its been a while since I am stuck with this problem, and the only solution is going across the file to find all type==questions and then looping again to find the locators and titles(which is computationally not really feasible for a huge chunk of files)

notsopeter
  • 11
  • 2
  • @rv.kvetch this is a dummy format, the file is actually json only – notsopeter Dec 29 '21 at 17:37
  • So this is a https://en.wikipedia.org/wiki/Tree_traversal question? Have you considered parsing it into a python object with json.loads(data) so you can control the iteration? – Kenny Ostrom Dec 29 '21 at 17:46
  • @KennyOstrom No i haven't, will try and update real quick, thanks Update: The format of this object is still the same, any other ideas please? – notsopeter Dec 29 '21 at 17:51
  • The format is supposed to be the same. The idea is you control the iteration so when you find a question, you pull the locator and title together as one. Okay fine, can you include a pip command and import statement? I just get False when I use that expr on that data. – Kenny Ostrom Dec 29 '21 at 18:02
  • Just included the pip and import statements @KennyOstrom – notsopeter Dec 29 '21 at 18:31
  • Let me be more clear. You are not using the standard library, nor are you using "pip install jsonpath" so what library is this? Also, without knowing what library, I would guess you want "'$.[?(@.type== "QUESTION")]'" so you can get one object with both title and locator. – Kenny Ostrom Dec 29 '21 at 18:43
  • @KennyOstrom I have provided 'pip install jsonpath' in the description, and it is a standard library, but thats the problem, there are not many examples associated with it because of its recent-ness also i get all the instances where the question is present and then again that lack of sequence is causing issue – notsopeter Dec 29 '21 at 18:55
  • can you post the output of "pip show jsonpath"? Mine is 0.82 by Phil Budne – Kenny Ostrom Dec 29 '21 at 21:00
  • Anyhow, you need to group the title and locator together as one object at the time of parsing (which also means you only parse it once). – Kenny Ostrom Dec 29 '21 at 21:16

1 Answers1

0

The key is to parse once, and treat the objects you find as objects, so you group the correct title and locator together. They are easy to split if you need.

Here's a code sample demonstrating all the various answers I made in comments. I don't know what exact library you're using, but they all seem to implement the same JSONPath, so you can probably use this. Just change the function names and parameter order to fit whatever library you actually have.

from jsonpath import jsonpath
import json

text = """{
"title": "main questions",
"type": "static",
"value":
    {
    "title": "state your name",
    "type": "QUESTION",
    "locator": "namelocator"
    }
}"""

# use jsonpath to find the question nodes
data = json.loads(text)
questions_parsed = jsonpath(obj=data, expr='$.[?(@.type== "QUESTION")]')
print (questions_parsed)

[{'title': 'state your name', 'type': 'QUESTION', 'locator': 'namelocator'}]

# python code to parse the same structure
def find_questions(data):
    if isinstance(data, dict):
        if 'type' in data and 'QUESTION' == data['type']:
            # TODO: write a dataclass, or validate that it has title and locator
            yield data
        elif 'value' in data and isinstance(data['value'], dict):
            value = data['value']
            yield from find_questions(value)
    elif isinstance(data, list):
        for item in data:
            yield from find_questions(item)

questions = [(question['title'], question['locator']) for question in find_questions(json.loads(text))]

Like I said, it's easy to split the one object into separate lists if you need them:
How to unzip a list of tuples into individual lists?

titles, locators = (list(t) for t in zip(*questions))
print(titles)
print(locators)

['state your name']
['namelocator']

I used this implementation:

pip show jsonpath

Name: jsonpath
Version: 0.82
Summary: An XPath for JSON
Home-page: http://www.ultimate.com/phil/python/#jsonpath
Author: Phil Budne
Author-email: phil@ultimate.com
License: MIT

Kenny Ostrom
  • 5,639
  • 2
  • 21
  • 30
  • Thanks a lot for putting time into solving this problem. However, the issue is not extracting questions and answers, but finding them via single parsing and not finding all types and then the questions and answers corresponding to them(via looping). I will definitely try building a class to validate locator and class in one go. Thanks again – notsopeter Dec 31 '21 at 01:51
  • Are you sure it's not a tree traversal question, like I asked in my first comment? Look at that link again, please. – Kenny Ostrom Dec 31 '21 at 23:40
  • I checked that link, that is not actually related to it. – notsopeter Jan 01 '22 at 19:46