Python/API Request - Extract data from API request with dynamic output

Question

I'm working with API requests for the first time, and I'm wondering if the following is possible to do:

I have a function that receives API responses that look like this:

{
    "DATA_Items": [{
        "DATA": {
            "DATA_data_meta": {
                "ASSIGNER": "info@gmail.com",
                "ID": "DATA-2021-43062"
            },
            "data_format": "IRE",
            "data_type": "DATA",
            "data_version": "4.0",
            "description": {
                "description_data": [{
                    "lang": "en",
                    "value": "during web page generation."
                }]
            },
            "problemtype": {
                "problemtype_data": [{
                    "description": [{
                        "lang": "en",
                        "value": "CWE-79"
                    }]
                }]
            },
            "references": {
                "reference_data": [{
                        "name": "https://blablabla/data",
                        "refsource": "CONFIRM",
                        "tags": ["Advisory"],
                        "url": "https://blablabl/data"
                    },
                    {
                        "name": "http://blablabla.com/files/166055/mail-7.0.1-Cross-Site-Scripting.html",
                        "refsource": "MISC",
                        "tags": ["Exploit",
                            "Advisory",
                            "Entry"
                        ],
                        "url": "http://package.com/files/166055/mail-7.0.1-Cross-Site-Scripting.html"
                    }
                ]
            }
        },
        "configurations": {
            "DATA_data_version": "4.0",
            "nodes": [{
                "ID_match": [{
                        "ID23Uri": "ID:2.3:a:info:mail:*:*:*:*:*:*:*:*",
                        "ID_name": [],
                        "versionEndExcluding": "2.0.2",
                        "versionStartIncluding": "3.0.0",
                        "vulnerable": true
                    },
                    {
                        "ID23Uri": "ID:2.3:a:info:mail:*:*:*:*:*:*:*:*",
                        "ID_name": [],
                        "versionEndExcluding": "6.46",
                        "versionStartIncluding": "9.7.0",
                        "vulnerable": true
                    },
                    {
                        "ID23Uri": "ID:2.3:a:info:mail:*:*:*:*:*:*:*:*",
                        "ID_name": [],
                        "versionEndExcluding": "6.2.8",
                        "versionStartIncluding": "2.2.0",
                        "vulnerable": true
                    }
                ],
                "children": [],
                "operator": "OR"
            }]
        },
        "impact": {
            "baseMetricV2": {
                "acInsufInfo": false,
                "datasV2": {
                    "Impact": "NONE",
                    "accessComplexity": "MEDIUM",
                    "accessVector": "NETWORK",
                    "authentication": "NONE",
                    "availabilityImpact": "NONE",
                    "baseScore": 4.3,
                    "integrityImpact": "PARTIAL",
                    "vectorString": "AV:N/AC:M/Au:N/C:N/I:P/A:N",
                    "version": "2.0"
                },
                "exploitabilityScore": 8.6,
                "impactScore": 2.9,
                "obtainAllPrivilege": false,
                "obtainOtherPrivilege": false,
                "obtainUserPrivilege": false,
                "severity": "MEDIUM",
                "userInteractionRequired": true
            },
            "baseMetricV3": {
                "Score": 2.8,
                "impactScore": 1.7,
                "sV3": {
                    "Complexity": "LOW",
                    "Vector": "NETWORK",
                    "availabilityImpact": "NONE",
                    "baseScore": 6.1,
                    "baseSeverity": "MEDIUM",
                    "confidentialityImpact": "LOW",
                    "integrityImpact": "LOW",
                    "privilegesRequired": "NONE",
                    "scope": "CHANGED",
                    "userInteraction": "REQUIRED",
                    "vectorString": "BASE",
                    "version": "3.1"
                }
            }
        },
        "lastModifiedDate": "2012-03-04T16:33Z",
        "publishedDate": "2012-05-02T11:15Z"
    }]
}

The part that I'm interested in is:

{"ID_match": [{"vulnerable": true

the vulnerable after ID_match is always the same.

However, the ID_match can be in multiple places of the API call, and I'm not sure about all the different possibilities. I do have code that loops over some of the ID_matches, which looks like this:

date = datetime.datetime.now() + datetime.timedelta(days=-1)
response = requests.get('https://somewebsite?dataMatchString={}&modStartDate={}-{:02d}-{:02d}T00:00:00:000%20CEST&modEndDate={}-{:02d}-{:02d}T00:00:00:000%20CEST'.format(
                application.id, date.year, date.month, date.day - 1, date.year, date.month, date.day + 2)).json()

for check in response['result']['DATA_Items'][0]['configurations']['nodes'][0]['children'][0]['ID_match']:
   print(check)
for check2 in response['result']['DATA_Items'][0]['configurations']['nodes'][0]['children'][1]['ID_match']:
   print(check2)
for check3 in response['result']['DATA_Items'][0]['configurations']['nodes'][0]['ID_match']:
   print(check3)

When I do this, I do see that for some API responses I do get printed the part that I want to have, but it also misses some.

I was wondering if it is possible to search for the path(s) where ID_match is, and then use it to get the value of vulnerable

Don't use string `.format()` to build your URL. Use a parameters dict, like this: `requests.get('https://server/path', {dataMatchString: application.id, modStartDate: date1, modEndDate: date2})`, and use [this](https://stackoverflow.com/q/2150739/18771) to create `date1` and `date2` in the proper format. — Tomalak, Mar 06 '22 at 16:46

Tomalak · Accepted Answer · 2022-03-06T16:36:02.953

You could use a recursive function that traverses the entire object graph, looking at all nested dicts, and all list items, and returns those dicts that have a 'vulnerable': True entry.

def find_vulnerable_nodes(node):
    if isinstance(node, list):
        for item in node:
            yield from find_vulnerable_nodes(item)
    elif isinstance(node, dict):
        if node.get('vulnerable') == True:
            yield node
        else:
            for item in node.values():
                yield from find_vulnerable_nodes(item)

This way, the structure and nesting depth of the input data is irrelevant.

Usage:

data = requests.get('...').json()

for n in find_vulnerable_nodes(data):
    print(n)

or

vulnerable_nodes = list(find_vulnerable_nodes(data))

Result with your sample data:

{'ID23Uri': 'ID:2.3:a:info:mail:*:*:*:*:*:*:*:*', 'ID_name': [], 'versionEndExcluding': '2.0.2', 'versionStartIncluding': '3.0.0', 'vulnerable': True}
{'ID23Uri': 'ID:2.3:a:info:mail:*:*:*:*:*:*:*:*', 'ID_name': [], 'versionEndExcluding': '6.46', 'versionStartIncluding': '9.7.0', 'vulnerable': True}
{'ID23Uri': 'ID:2.3:a:info:mail:*:*:*:*:*:*:*:*', 'ID_name': [], 'versionEndExcluding': '6.2.8', 'versionStartIncluding': '2.2.0', 'vulnerable': True}

This works perfectly, thank you so much, also for editing my question, it looks much better now =)! — user2133561, Mar 06 '22 at 18:35

Python/API Request - Extract data from API request with dynamic output

1 Answers1