0

How to iterate over JSON using python with children nodes?

  • Could you be a bit more specific with the data you are trying to access within the JSON string? – E Joseph Oct 30 '22 at 18:25
  • I basically want the same JSON, but exclude (i.e., remove) the 'configurations' that have "vulnerable" : false Then output the filtered results to a new JSON. – paliknight Oct 30 '22 at 18:30
  • your question has a lot of noise like `find all java (only) vulnerabilities` and `then order (descending) the results by cvss3 severity score`, `The file is extremely large` which basically discourages people from answering your question and cannot be answered within the context of your question, consider splitting up the problem into several less complex problems such as , 1. reading the file 2. filtering the file 3. arranging the file 4. saving the file, and try to answer each question separately. – Ahmed AEK Oct 30 '22 at 18:39
  • Thank you for the recommendation. Hopefully my update is less discouraging. – paliknight Oct 30 '22 at 18:46

2 Answers2

0

This solution should solve your initial problem (remove configurations with 'vulnerable:false'). I used the sample json data you provided in the question.

import json

with open('data.json','r') as f:
    data = json.load(f)

nodes = data.get('CVE_Items')[0].get('configurations').get('nodes')[0].get('cpe_match')

for index,node in enumerate(nodes):
    if not node.get('vulnerable'):
        nodes.pop(index)

with open('new_data.json','w') as f:
    f.write(json.dumps(data))
E Joseph
  • 316
  • 2
  • 8
  • While this maybe correct for the json code visible on StackOverflow. There is a larger file included. which this likely won't go through the full input – Andrew Ryan Oct 30 '22 at 19:01
  • I assumed the sample data would be in the same format as the original file. Let me see if I can update it to fit the original data. – E Joseph Oct 30 '22 at 19:03
0

Here is a way that you could read in the data, clean it (remove the vulnerabilities with false), sort it, and then download it as a new file.

import json

def base_score(item): # sorting function used in .sort()
    # https://stackoverflow.com/questions/3121979/how-to-sort-a-list-tuple-of-lists-tuples-by-the-element-at-a-given-index        
    if 'baseMetricV3' not in item['impact']:
        return (0, item['cve']['CVE_data_meta']['ID']) # no values are at a 0, therefore will sort by ID
    return (item['impact']['baseMetricV3']['cvssV3']['baseScore'], item['cve']['CVE_data_meta']['ID']) # will also sort by ID if there are scores that are the same

with open('nvdcve-1.1-2022.json', 'r') as file: # read in the file and load it as a json format (similar to python dictionaries)
    dict_data = json.load(file)

for CVE_Item in dict_data['CVE_Items']:
    for node in CVE_Item['configurations']['nodes']:
        # https://stackoverflow.com/questions/1207406/how-to-remove-items-from-a-list-while-iterating
        node['cpe_match'][:] = [item for item in node['cpe_match'] if item['vulnerable']] # removing items while iterating through
        if node['children']: # look at the children to see if they have any false vulnerable items and remove
            for child_node in node['children']:
                child_node['cpe_match'][:] = [item for item in child_node['cpe_match'] if item['vulnerable']] # removing items while iterating through

dict_data['CVE_Items'].sort(reverse=True, key=base_score) # sort the data and have it in descending order.

with open('cleaned_nvdcve-1.1-2022.json','w') as f: # write the file to the current working directory.
    f.write(json.dumps(dict_data))
Andrew Ryan
  • 1,489
  • 3
  • 15
  • 21
  • Thank you very much for your help. I updated the initial post. Further down in the JSON, under 'impact', there is cvssV3 then its basescore, respectively. I need to sort it (descending order) by the cvssV3 basescore value (numerical). – paliknight Oct 30 '22 at 19:21
  • thank you very much for your help. the results need to be ordered by the cvssV3 base score, and if not available, by the CVE_ID. – paliknight Oct 30 '22 at 20:01
  • @paliknight where is CVE_ID? I don't see it when I look it up in the file. do you mean the 'ID' in 'CVE_data_meta'? – Andrew Ryan Oct 30 '22 at 20:04
  • hello, sorry for the ambiguity. yes thats what i meant – paliknight Oct 30 '22 at 20:08
  • is there a way to have the scores listed first, in descending order, then the ID's listed after, in descending order? – paliknight Oct 30 '22 at 20:18
  • @paliknight I have updated to provide you with something of that sorting nature check out the comments in `base_score` – Andrew Ryan Oct 30 '22 at 20:25
  • Thank you again! I am having an issue where even when using your code, it still returns vulnerable : false. It didnt remove the false and keep true, only. – paliknight Oct 30 '22 at 20:32
  • I didn't see this before, but there are multiple levels of dicts (dicts inside of the 'children' list of those dicts) containing the vulnerable key word, instead of what it looked like from the sample code you gave. Will try to give it a look at later. Though do you need those filtered out? Just making sure before diving in to it later – Andrew Ryan Oct 30 '22 at 21:08
  • Yes please. Any vulnerability that is false needs to be removed. I cant thank you enough for your generosity. By the way, the instructor asked to find all the Java vulnerabilities, but not JavaScript vulnerabilities. How do I distinguish between the two within a JSON? – paliknight Oct 30 '22 at 21:20
  • @paliknight how is something identified as java vs. javascript? – Andrew Ryan Oct 30 '22 at 21:31
  • That's what was really confusing. I went through the entire file and didn't see anything Java vs. JavaScript related. – paliknight Oct 30 '22 at 21:37
  • You've been a tremendous help. Thank you again for everything. This is perfect. I will simply ignore the Java vs JS part and have the instructor elaborate. – paliknight Oct 30 '22 at 21:54