-1

I am trying to interact with an API and running into issues accessing nested objects. Below is sample json output that I am working with.

{
    "results": [
        {
            "task_id": "22774853-2b2c-49f4-b044-2d053141b635",
            "params": {
                "type": "host",
                "target": "54.243.80.16",
                "source": "malware_analysis"
            },
            "v": "2.0.2",
            "status": "success",
            "time": 227,
            "data": {
                "details": {
                    "as_owner": "Amazon.com, Inc.",
                    "asn": "14618",
                    "country": "US",
                    "detected_urls": [],
                    "resolutions": [
                        {
                            "hostname": "bumbleride.com",
                            "last_resolved": "2016-09-15 00:00:00"
                        },
                        {
                            "hostname": "chilitechnology.com",
                            "last_resolved": "2016-09-16 00:00:00"
                        }
                    ],
                    "response_code": 1,
                    "verbose_msg": "IP address in dataset"
                },
                "match": true
            }
        }
    ]
}

The deepest I am able to access is the data portion which returns too much.... ideally I am just trying access as_owner,asn,country,detected_urls,resolutions

When I try to access details / response code ... etc I will get a KeyError. My nested json goes deeper then other Q's mentioned and I have tried that logic.

Below is my current code snippet and any help is appreciated!

import requests
import json
headers = {
   'Content-Type': 'application/json',
}

params = (
   ('wait', 'true'),
)

data = '{"target":{"one":{"type": "ip","target": "54.243.80.16", "sources": ["xxx","xxxxx"]}}}'

r=requests.post('https://fakewebsite:8000/api/services/intel/lookup/jobs', headers=headers, params=params, data=data, auth=('apikey', ''))
parsed_json = json.loads(r.text)
#results = parsed_json["results"]
for item in parsed_json["results"]:
     print(item['data'])
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
user2934204
  • 1
  • 1
  • 4

1 Answers1

0

You just need to index correctly into the converted JSON. Then you can easily loop over a list of the keys you want to fetch, since they are all in the "details" dictionary.

import json

raw = '''\
{
    "results": [
        {
            "task_id": "22774853-2b2c-49f4-b044-2d053141b635",
            "params": {
                "type": "host",
                "target": "54.243.80.16",
                "source": "malware_analysis"
            },
            "v": "2.0.2",
            "status": "success",
            "time": 227,
            "data": {
                "details": {
                    "as_owner": "Amazon.com, Inc.",
                    "asn": "14618",
                    "country": "US",
                    "detected_urls": [],
                    "resolutions": [
                        {
                            "hostname": "bumbleride.com",
                            "last_resolved": "2016-09-15 00:00:00"
                        },
                        {
                            "hostname": "chilitechnology.com",
                            "last_resolved": "2016-09-16 00:00:00"
                        }
                    ],
                    "response_code": 1,
                    "verbose_msg": "IP address in dataset"
                },
                "match": true
            }
        }
    ]
}
'''

parsed_json = json.loads(raw)

wanted = ['as_owner', 'asn', 'country', 'detected_urls', 'resolutions']

for item in parsed_json["results"]:
    details = item['data']['details']
    for key in wanted:
        print(key, ':', json.dumps(details[key], indent=4))
    # Put a blank line at the end of the details for each item
    print()    

output

as_owner : "Amazon.com, Inc."
asn : "14618"
country : "US"
detected_urls : []
resolutions : [
    {
        "hostname": "bumbleride.com",
        "last_resolved": "2016-09-15 00:00:00"
    },
    {
        "hostname": "chilitechnology.com",
        "last_resolved": "2016-09-16 00:00:00"
    }
]

BTW, when you fetch JSON data using requests there's no need to use json.loads: you can access the converted JSON using the .json method of the returned request object instead of using its .text attribute.


Here's a more robust version of the main loop of the above code. It simply ignores any missing keys. I didn't post this code earlier because the extra if tests make it slightly less efficient, and I didn't know that keys could be missing.

for item in parsed_json["results"]:
    if not 'data' in item:
        continue
    data = item['data']
    if not 'details' in data:
        continue
    details = data['details']
    for key in wanted:
        if key in details:
            print(key, ':', json.dumps(details[key], indent=4))
    # Put a blank line at the end of the details for each item
    print()
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • `Traceback (most recent call last): "hostname": "bumbleride.com", "last_resolved": "2016-09-15 00:00:00" File "C:/Users/Test/PycharmProjects/dnsdumpster/testt.py", line 21, in details =item['data']['details'] KeyError: 'details' },` I do get output in the console sometimes but others I get the Key error mentioned above – user2934204 Nov 10 '17 at 14:41
  • @user2934204 You should have mentioned that sometimes the "details" section may be missing. That's easy to deal with. Are there any other keys that might be missing? Will _every_ item in a "results" list _always_ contain a "data" section? How about 'as_owner', 'asn', 'country', 'detected_urls', 'resolutions'? – PM 2Ring Nov 10 '17 at 15:09
  • I apologize for not knowing that details may be missing. Basically this is querying a threat service. If there isn't a match for the specified field, theoretically the details section would be missing. It will always contain a data section regardless if there is any matches or not. – user2934204 Nov 10 '17 at 15:26
  • @user2934204 Ok. I've just added some more code that handles _any_ missing keys. – PM 2Ring Nov 10 '17 at 15:29