44

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:

>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}

Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.

How can I write code that directly gets this value?

I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.

More generally, how can I figure out what the "path" is to the data, and write the code for it?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
knu2xs
  • 910
  • 2
  • 11
  • 23

5 Answers5

85

For reference, let's see what the original JSON would look like, with pretty formatting:

>>> print(json.dumps(my_json, indent=4))
{
    "name": "ns1:timeSeriesResponseType",
    "declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
    "scope": "javax.xml.bind.JAXBElement$GlobalScope",
    "value": {
        "queryInfo": {
            "creationTime": 1349724919000,
            "queryURL": "http://waterservices.usgs.gov/nwis/iv/",
            "criteria": {
                "locationParam": "[ALL:103232434]",
                "variableParam": "[00060, 00065]"
            },
            "note": [
                {
                    "value": "[ALL:103232434]",
                    "title": "filter:sites"
                },
                {
                    "value": "[mode=LATEST, modifiedSince=null]",
                    "title": "filter:timeRange"
                },
                {
                    "value": "sdas01",
                    "title": "server"
                }
            ]
        }
    },
    "nil": false,
    "globalScope": true,
    "typeSubstituted": false
}

That lets us see the structure of the data more clearly.

In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.

To get the desired value, we simply put those accesses one after another:

my_json['value']['queryInfo']['creationTime'] # 1349724919000
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
dm03514
  • 54,664
  • 18
  • 108
  • 145
21

I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.

If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:

name = my_json.get('name') # will return None if 'name' doesn't exist

Another way is to test for a key explicitly:

if 'name' in resp_dict:
    name = resp_dict['name']
else:
    pass

However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:

try:
    creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
    print("could not read the creation time!")
    # or substitute a placeholder, or raise a new exception, etc.
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
ButtersB
  • 495
  • 1
  • 7
  • 14
7

Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:

import json

# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}

# dumps the json object into an element
json_str = json.dumps(data)

# load the json to a string
resp = json.loads(json_str)

# print the resp
print(resp)

# extract an element in the response
print(resp['test1'])
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Sireesh Yarlagadda
  • 12,978
  • 3
  • 74
  • 76
  • 1
    I edited this answer to add some explanation and make the code a bit neater, but it's still not very useful. It seems that OP already understood how to handle simple cases, from reading some other tutorials; the question is about how to handle nesting in the data structure. – Karl Knechtel Jul 02 '22 at 00:53
5

Try this.

Here, I fetch only statecode from the COVID API (a JSON array).

import requests

r = requests.get('https://api.covid19india.org/data.json')

x = r.json()['statewise']

for i in x:
  print(i['statecode'])
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • What is the *"the COVID API"*? Can you elaborate a little bit? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/67092033/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Jul 11 '22 at 19:50
-1

Try this:

from functools import reduce
import re


def deep_get_imps(data, key: str):
    split_keys = re.split("[\\[\\]]", key)
    out_data = data
    for split_key in split_keys:
        if split_key == "":
            return out_data
        elif isinstance(out_data, dict):
            out_data = out_data.get(split_key)
        elif isinstance(out_data, list):
            try:
                sub = int(split_key)
            except ValueError:
                return None
            else:
                length = len(out_data)
                out_data = out_data[sub] if -length <= sub < length else None
        else:
            return None
    return out_data


def deep_get(dictionary, keys):
    return reduce(deep_get_imps, keys.split("."), dictionary)

Then you can use it like below:

res = {
    "status": 200,
    "info": {
        "name": "Test",
        "date": "2021-06-12"
    },
    "result": [{
        "name": "test1",
        "value": 2.5
    }, {
        "name": "test2",
        "value": 1.9
    },{
        "name": "test1",
        "value": 3.1
    }]
}

>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
azec-pdx
  • 4,790
  • 6
  • 56
  • 87
Heedo Lee
  • 59
  • 3
  • This is an interesting code project, but being able to write `deep_get(res, "result[2].name")` and have it work doesn't seem like an improvement over just directly asking for `res['result'][2]['name']`. – Karl Knechtel Jul 02 '22 at 00:54