-1

I'm trying to check if there's a username available when type = account by parsing the key "Entities". If not available, get the IP from type = ip

data = '[\n    {\n        "TimeRecorded": "2020-09-20T08:56:12Z",\n        "AlertName": "Alert1",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.2\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine1\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant1\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    },\n    {\n        "TimeRecorded": "2020-09-20T07:56:13Z",\n        "AlertName": "Alert2",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.1\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine2\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant2\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    },\n    {\n        "TimeRecorded": "2020-09-20T05:56:14Z",\n        "AlertName": "Alert3",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.3\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine3\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant3\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    }\n]'

I attempted to use a couple of for loops for data['Entities'][0] but either getting TypeError: list indices must be integers or slices, not str or [. What would be the neat way to access the values in each entity?

for i in data:
     print(i['Entities'])
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
HieDop
  • 35
  • 7
  • Where did you get the data, and did you do anything to it before you posted it? It's JSON containing encoded JSON... – MattDMo Sep 27 '20 at 19:19
  • `data` is a string, not a json structure. you probably want to start by parsing that json – njzk2 Sep 27 '20 at 21:27
  • @mkrieger1 Not at all. I guess it's because of the unstructured data and that I tried all suggestions here and still got errors such as `TypeError: string indices must be integers`, `TypeError: the JSON object must be str, bytes or bytearray, not list`... – HieDop Sep 27 '20 at 22:58

3 Answers3

1

You can use json.loads() to parse this oddly-structured JSON twice.

import json
from pprint import pprint

newdata = json.loads(data) # creates dict with some embedded JSON
for alert in newdata:
    alert["Entities"] = json.loads(alert["Entities"])

pprint(newdata)

yields

[{'AlertName': 'Alert1',
  'Entities': [{'$id': '3', 'Address': '160.160.100.2', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine1',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant1', 'Type': 'account'}],
  'TimeRecorded': '2020-09-20T08:56:12Z'},
 {'AlertName': 'Alert2',
  'Entities': [{'$id': '3', 'Address': '160.160.100.1', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine2',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant2', 'Type': 'account'}],
  'TimeRecorded': '2020-09-20T07:56:13Z'},
 {'AlertName': 'Alert3',
  'Entities': [{'$id': '3', 'Address': '160.160.100.3', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine3',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant3', 'Type': 'account'}],
  'TimeRecorded': '2020-09-20T05:56:14Z'}]

Please note that the numbers in the "$id" fields are stored as text, so if you need those values you'll need to use int() on them.

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • It popped up this error: `raise ValueError(f'malformed node or string: {node!r}') ValueError: malformed node or string: `. I just hope that there's a way to loop through anything under `Entities` and get `Name` if there's one available. – HieDop Sep 27 '20 at 23:00
  • @HieDop My code works with the data you posted in the question without an errors - I copied it directly out of the REPL I used to test it. If you are using different data or different code, please ask a new question, making sure to include the data you're using and the full text of the error you're getting. – MattDMo Sep 28 '20 at 16:14
0

Just iterate through data (assuming already it's in JSON form) and process the entities key using json.loads for each time your iterate. (As right now, your data is in string form)

import json
for d in data:
    processed = json.loads(d['Entities'])  # returns an array of dicts in your case
    # do whatever you need to do here with processed
Kevin Sheng
  • 411
  • 1
  • 5
  • 8
0

Your data is formatted in a very unusual and cumbersome way _ it's actually not in JSON format, it's instead a string literal representation of a Python data-structure — but it can be worked around using a ast.lister_eval() twice as illustrated below. As you should be able to see from the output, the contents of each alert can noe be accessed by values of keys in a list of a dictionaries.

from ast import literal_eval
from pprint import pprint


data = '[\n    {\n        "TimeRecorded": "2020-09-20T08:56:12Z",\n        "AlertName": "Alert1",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.2\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine1\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant1\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    },\n    {\n        "TimeRecorded": "2020-09-20T07:56:13Z",\n        "AlertName": "Alert2",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.1\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine2\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant2\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    },\n    {\n        "TimeRecorded": "2020-09-20T05:56:14Z",\n        "AlertName": "Alert3",\n        "Entities": "[\\r\\n  {\\r\\n    \\"$id\\": \\"3\\",\\r\\n    \\"Address\\": \\"160.160.100.3\\",\\r\\n    \\"Type\\": \\"ip\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"4\\",\\r\\n    \\"DnsDomain\\": \\"example.com\\",\\r\\n    \\"HostName\\": \\"MyMachine3\\",\\r\\n    \\"Type\\": \\"host\\"\\r\\n  },\\r\\n  {\\r\\n    \\"$id\\": \\"5\\",\\r\\n    \\"Name\\": \\"Tenant3\\",\\r\\n    \\"Type\\": \\"account\\"\\r\\n  }\\r\\n]"\n    }\n]'

data = literal_eval(data)
for alert in data:
    alert['Entities'] = literal_eval(alert['Entities'])

pprint(data, sort_dicts=False)

Output:

[{'TimeRecorded': '2020-09-20T08:56:12Z',
  'AlertName': 'Alert1',
  'Entities': [{'$id': '3', 'Address': '160.160.100.2', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine1',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant1', 'Type': 'account'}]},
 {'TimeRecorded': '2020-09-20T07:56:13Z',
  'AlertName': 'Alert2',
  'Entities': [{'$id': '3', 'Address': '160.160.100.1', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine2',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant2', 'Type': 'account'}]},
 {'TimeRecorded': '2020-09-20T05:56:14Z',
  'AlertName': 'Alert3',
  'Entities': [{'$id': '3', 'Address': '160.160.100.3', 'Type': 'ip'},
               {'$id': '4',
                'DnsDomain': 'example.com',
                'HostName': 'MyMachine3',
                'Type': 'host'},
               {'$id': '5', 'Name': 'Tenant3', 'Type': 'account'}]}]
martineau
  • 119,623
  • 25
  • 170
  • 301
  • I got this error `TypeError: string indices must be integers`. The data I got was from my internal API system. Initially it had more keys and values and I shortened it further to just what's included in this post. – HieDop Sep 27 '20 at 23:01
  • 1
    All I can go on is what you posted in your question — and the code in my answer works as claimed with no exception being raised. You must have changed the data improperly or left out some other detail. – martineau Sep 27 '20 at 23:11