-1

I have a txt file that contains rows that look like dictionaries. Here is the first line for example:

'{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Microsoft-Windows-Sysmon/Operational","process":{"thread":{"id":5188},"pid":5060},"user":{"domain":"NT AUTHORITY","identifier":"S-1-5-18","name":"SYSTEM","type":"Well Known Group"},"event_data":{"ProcessId":"24244","Image":"C:\\\\Program Files\\\\Synaptics\\\\SynTP\\\\SynTPEnh.exe","User":"LAPTOP-OUNS3BEM\\\\Akshay Bahade","RuleName":"-","UtcTime":"2022-06-13 02:56:33.129","ProcessGuid":"{2017f52f-a75f-62a6-e79a-000000001b00}"},"provider_guid":"{5770385f-c22a-43e0-bf4c-06f5698ffbd9}","opcode":"Info","version":3,"provider_name":"Microsoft-Windows-Sysmon","record_id":80594,"event_id":"5","computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"Process terminated (rule: ProcessTerminate)"},"type":"windows_usb","@timestamp":"2022-06-13T02:56:33.129Z","event":{"kind":"event","action":"Process terminated (rule: ProcessTerminate)","code":"5","provider":"Microsoft-Windows-Sysmon","created":"2022-06-13T02:57:19.636Z"},"ecs":{"version":"8.0.0"},"@version":"1","agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"tags":["sysmon","beats_input_codec_plain_applied"],"message":"Process terminated:\\nRuleName: -\\nUtcTime: 2022-06-13 02:56:33.129\\nProcessGuid: {2017f52f-a75f-62a6-e79a-000000001b00}\\nProcessId: 24244\\nImage: C:\\\\Program Files\\\\Synaptics\\\\SynTP\\\\SynTPEnh.exe\\nUser: LAPTOP-OUNS3BEM\\\\username"}'

I am trying to read this file as a JSON, but I keep getting errors:

import json

f = open("usb1.txt", "r")
contents = f.read()
json.loads(contents)

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-76-469ad981b8c6> in <module>
----> 1 json.loads(contents)

c:\users\admin\appdata\local\programs\python\python37\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

c:\users\admin\appdata\local\programs\python\python37\lib\json\decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 1515)

So, I tried to split the file into rows, and then try to see if converting into JSON works:

lis = contents.split('username"}')
s = lis[0] + '''username"}'''

And it works this way:

json.loads(s)

{'host': {'name': 'LAPTOP-OUNS3BEM'},
 'log': {'level': 'information'},
 'winlog': {'channel': 'Microsoft-Windows-Sysmon/Operational',
  'process': {'thread': {'id': 5188}, 'pid': 5060},
  'user': {'domain': 'NT AUTHORITY',
   'identifier': 'S-1-5-18',
   'name': 'SYSTEM',
   'type': 'Well Known Group'},
  'event_data': {'ProcessId': '24244',
   'Image': 'C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe',
   'User': 'LAPTOP-OUNS3BEM\\username',
   'RuleName': '-',
   'UtcTime': '2022-06-13 02:56:33.129',
   'ProcessGuid': '{2017f52f-a75f-62a6-e79a-000000001b00}'},
  'provider_guid': '{5770385f-c22a-43e0-bf4c-06f5698ffbd9}',
  'opcode': 'Info',
  'version': 3,
  'provider_name': 'Microsoft-Windows-Sysmon',
  'record_id': 80594,
  'event_id': '5',
  'computer_name': 'LAPTOP-OUNS3BEM',
  'api': 'wineventlog',
  'task': 'Process terminated (rule: ProcessTerminate)'},
 'type': 'windows_usb',
 '@timestamp': '2022-06-13T02:56:33.129Z',
 'event': {'kind': 'event',
  'action': 'Process terminated (rule: ProcessTerminate)',
  'code': '5',
  'provider': 'Microsoft-Windows-Sysmon',
  'created': '2022-06-13T02:57:19.636Z'},
 'ecs': {'version': '8.0.0'},
 '@version': '1',
 'agent': {'ephemeral_id': '3a7b40b3-e6ac-4486-b403-10d86283d7dc',
  'id': '1858c9f1-b457-484f-b064-56171362bad9',
  'name': 'LAPTOP-OUNS3BEM',
  'version': '8.2.2',
  'type': 'winlogbeat'},
 'tags': ['sysmon', 'beats_input_codec_plain_applied'],
 'message': 'Process terminated:\nRuleName: -\nUtcTime: 2022-06-13 02:56:33.129\nProcessGuid: {2017f52f-a75f-62a6-e79a-000000001b00}\nProcessId: 24244\nImage: C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\nUser: LAPTOP-OUNS3BEM\\username'}

So, what's the problem when trying to load the whole file as JSON, and how do I do it?


I'm providing a sample of 5 rows here that you can save in a txt file and try out your anaylsis :

{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Microsoft-Windows-Sysmon/Operational","process":{"thread":{"id":5188},"pid":5060},"user":{"domain":"NT AUTHORITY","identifier":"S-1-5-18","name":"SYSTEM","type":"Well Known Group"},"event_data":{"ProcessId":"24244","Image":"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe","User":"LAPTOP-OUNS3BEM\\username","RuleName":"-","UtcTime":"2022-06-13 02:56:33.129","ProcessGuid":"{2017f52f-a75f-62a6-e79a-000000001b00}"},"provider_guid":"{5770385f-c22a-43e0-bf4c-06f5698ffbd9}","opcode":"Info","version":3,"provider_name":"Microsoft-Windows-Sysmon","record_id":80594,"event_id":"5","computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"Process terminated (rule: ProcessTerminate)"},"type":"windows_usb","@timestamp":"2022-06-13T02:56:33.129Z","event":{"kind":"event","action":"Process terminated (rule: ProcessTerminate)","code":"5","provider":"Microsoft-Windows-Sysmon","created":"2022-06-13T02:57:19.636Z"},"ecs":{"version":"8.0.0"},"@version":"1","agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"tags":["sysmon","beats_input_codec_plain_applied"],"message":"Process terminated:\nRuleName: -\nUtcTime: 2022-06-13 02:56:33.129\nProcessGuid: {2017f52f-a75f-62a6-e79a-000000001b00}\nProcessId: 24244\nImage: C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\nUser: LAPTOP-OUNS3BEM\\username"}
{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Microsoft-Windows-Sysmon/Operational","process":{"thread":{"id":5188},"pid":5060},"user":{"domain":"NT AUTHORITY","identifier":"S-1-5-18","name":"SYSTEM","type":"Well Known Group"},"event_data":{"ProcessId":"14116","Image":"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe","User":"LAPTOP-OUNS3BEM\\username","UtcTime":"2022-06-13 02:56:43.150","RuleName":"-","ProcessGuid":"{2017f52f-a76a-62a6-e99a-000000001b00}"},"provider_guid":"{5770385f-c22a-43e0-bf4c-06f5698ffbd9}","opcode":"Info","version":3,"provider_name":"Microsoft-Windows-Sysmon","event_id":"5","record_id":80600,"computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"Process terminated (rule: ProcessTerminate)"},"type":"windows_usb","@timestamp":"2022-06-13T02:56:43.152Z","event":{"kind":"event","code":"5","action":"Process terminated (rule: ProcessTerminate)","provider":"Microsoft-Windows-Sysmon","created":"2022-06-13T02:57:19.636Z"},"ecs":{"version":"8.0.0"},"@version":"1","tags":["sysmon","beats_input_codec_plain_applied"],"agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"message":"Process terminated:\nRuleName: -\nUtcTime: 2022-06-13 02:56:43.150\nProcessGuid: {2017f52f-a76a-62a6-e99a-000000001b00}\nProcessId: 14116\nImage: C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\nUser: LAPTOP-OUNS3BEM\\username"}
{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Microsoft-Windows-Sysmon/Operational","process":{"thread":{"id":5188},"pid":5060},"user":{"domain":"NT AUTHORITY","name":"SYSTEM","identifier":"S-1-5-18","type":"Well Known Group"},"event_data":{"Hashes":"SHA256=293528F0473244C095F115119AB7B4FE622DDF3799B10EE69927678F2EDC0380","UtcTime":"2022-06-13 02:56:57.130","ParentImage":"-","FileVersion":"19.3.4.228 06May21","Product":"Synaptics Pointing Device Driver","Description":"Synaptics TouchPad 64-bit Enhancements","TerminalSessionId":"1","LogonGuid":"{2017f52f-44c4-629d-b28b-020000000000}","RuleName":"-","IntegrityLevel":"Medium","ParentProcessId":"4964","ParentProcessGuid":"{00000000-0000-0000-0000-000000000000}","CurrentDirectory":"C:\\WINDOWS\\system32\\","ProcessId":"10452","Image":"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe","CommandLine":"\"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\"","ParentCommandLine":"-","ParentUser":"-","Company":"Synaptics Incorporated","User":"LAPTOP-OUNS3BEM\\username","LogonId":"0x28bb2","ProcessGuid":"{2017f52f-a779-62a6-ec9a-000000001b00}","OriginalFileName":"SynTPEnh.exe"},"provider_guid":"{5770385f-c22a-43e0-bf4c-06f5698ffbd9}","opcode":"Info","version":5,"provider_name":"Microsoft-Windows-Sysmon","event_id":"1","record_id":80606,"computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"Process Create (rule: ProcessCreate)"},"type":"windows_usb","@timestamp":"2022-06-13T02:56:57.139Z","event":{"kind":"event","code":"1","action":"Process Create (rule: ProcessCreate)","provider":"Microsoft-Windows-Sysmon","created":"2022-06-13T02:57:19.636Z"},"ecs":{"version":"8.0.0"},"@version":"1","tags":["sysmon","beats_input_codec_plain_applied"],"agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"message":"Process Create:\nRuleName: -\nUtcTime: 2022-06-13 02:56:57.130\nProcessGuid: {2017f52f-a779-62a6-ec9a-000000001b00}\nProcessId: 10452\nImage: C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\nFileVersion: 19.3.4.228 06May21\nDescription: Synaptics TouchPad 64-bit Enhancements\nProduct: Synaptics Pointing Device Driver\nCompany: Synaptics Incorporated\nOriginalFileName: SynTPEnh.exe\nCommandLine: \"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\"\nCurrentDirectory: C:\\WINDOWS\\system32\\\nUser: LAPTOP-OUNS3BEM\\username\nLogonGuid: {2017f52f-44c4-629d-b28b-020000000000}\nLogonId: 0x28BB2\nTerminalSessionId: 1\nIntegrityLevel: Medium\nHashes: SHA256=293528F0473244C095F115119AB7B4FE622DDF3799B10EE69927678F2EDC0380\nParentProcessGuid: {00000000-0000-0000-0000-000000000000}\nParentProcessId: 4964\nParentImage: -\nParentCommandLine: -\nParentUser: -"}
{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Microsoft-Windows-Sysmon/Operational","process":{"thread":{"id":5188},"pid":5060},"user":{"domain":"NT AUTHORITY","name":"SYSTEM","identifier":"S-1-5-18","type":"Well Known Group"},"event_data":{"Hashes":"SHA256=293528F0473244C095F115119AB7B4FE622DDF3799B10EE69927678F2EDC0380","UtcTime":"2022-06-13 02:57:12.178","FileVersion":"19.3.4.228 06May21","ParentImage":"-","Product":"Synaptics Pointing Device Driver","Description":"Synaptics TouchPad 64-bit Enhancements","TerminalSessionId":"1","LogonGuid":"{2017f52f-44c4-629d-b28b-020000000000}","RuleName":"-","IntegrityLevel":"Medium","ParentProcessGuid":"{00000000-0000-0000-0000-000000000000}","ParentProcessId":"4964","CurrentDirectory":"C:\\WINDOWS\\system32\\","ProcessId":"27096","Image":"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe","CommandLine":"\"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\"","ParentCommandLine":"-","ParentUser":"-","Company":"Synaptics Incorporated","User":"LAPTOP-OUNS3BEM\\username","LogonId":"0x28bb2","ProcessGuid":"{2017f52f-a788-62a6-ef9a-000000001b00}","OriginalFileName":"SynTPEnh.exe"},"provider_guid":"{5770385f-c22a-43e0-bf4c-06f5698ffbd9}","opcode":"Info","version":5,"provider_name":"Microsoft-Windows-Sysmon","record_id":80612,"event_id":"1","computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"Process Create (rule: ProcessCreate)"},"type":"windows_usb","@timestamp":"2022-06-13T02:57:12.191Z","event":{"kind":"event","action":"Process Create (rule: ProcessCreate)","code":"1","provider":"Microsoft-Windows-Sysmon","created":"2022-06-13T02:57:19.637Z"},"ecs":{"version":"8.0.0"},"@version":"1","tags":["sysmon","beats_input_codec_plain_applied"],"agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"message":"Process Create:\nRuleName: -\nUtcTime: 2022-06-13 02:57:12.178\nProcessGuid: {2017f52f-a788-62a6-ef9a-000000001b00}\nProcessId: 27096\nImage: C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\nFileVersion: 19.3.4.228 06May21\nDescription: Synaptics TouchPad 64-bit Enhancements\nProduct: Synaptics Pointing Device Driver\nCompany: Synaptics Incorporated\nOriginalFileName: SynTPEnh.exe\nCommandLine: \"C:\\Program Files\\Synaptics\\SynTP\\SynTPEnh.exe\"\nCurrentDirectory: C:\\WINDOWS\\system32\\\nUser: LAPTOP-OUNS3BEM\\username\nLogonGuid: {2017f52f-44c4-629d-b28b-020000000000}\nLogonId: 0x28BB2\nTerminalSessionId: 1\nIntegrityLevel: Medium\nHashes: SHA256=293528F0473244C095F115119AB7B4FE622DDF3799B10EE69927678F2EDC0380\nParentProcessGuid: {00000000-0000-0000-0000-000000000000}\nParentProcessId: 4964\nParentImage: -\nParentCommandLine: -\nParentUser: -"}
{"host":{"name":"LAPTOP-OUNS3BEM"},"log":{"level":"information"},"winlog":{"channel":"Security","process":{"thread":{"id":6604},"pid":884},"activity_id":"{13741580-7939-0002-d015-74133979d801}","event_data":{"ProcessCreationTime":"2022-06-06T00:06:06.4289200Z","SubjectUserSid":"S-1-5-21-1348782422-1367626683-3692934514-1003","ReadOperation":"%%8099","SubjectDomainName":"LAPTOP-OUNS3BEM","ClientProcessId":"15324","SubjectLogonId":"0x28bb2","Type":"1","TargetName":"Adobe User Info(Part1)","SubjectUserName":"username","CountOfCredentialsReturned":"1","ReturnCode":"3221226021"},"provider_guid":"{54849625-5478-4994-a5ba-3e3b0328c30d}","opcode":"Info","provider_name":"Microsoft-Windows-Security-Auditing","keywords":["Audit Success"],"event_id":"5379","record_id":39779,"computer_name":"LAPTOP-OUNS3BEM","api":"wineventlog","task":"User Account Management"},"type":"windows_usb","@timestamp":"2022-06-13T02:57:06.540Z","event":{"code":"5379","action":"User Account Management","outcome":"success","kind":"event","provider":"Microsoft-Windows-Security-Auditing","created":"2022-06-13T02:57:19.923Z"},"ecs":{"version":"8.0.0"},"@version":"1","agent":{"ephemeral_id":"3a7b40b3-e6ac-4486-b403-10d86283d7dc","id":"1858c9f1-b457-484f-b064-56171362bad9","name":"LAPTOP-OUNS3BEM","version":"8.2.2","type":"winlogbeat"},"tags":["beats_input_codec_plain_applied"],"message":"Credential Manager credentials were read.\n\nSubject:\n\tSecurity ID:\t\tS-1-5-21-1348782422-1367626683-3692934514-1003\n\tAccount Name:\t\tusername\n\tAccount Domain:\t\tLAPTOP-OUNS3BEM\n\tLogon ID:\t\t0x28BB2\n\tRead Operation:\t\tRead Credential\n\nThis event occurs when a user performs a read operation on stored credentials in Credential Manager."}
Kristada673
  • 3,512
  • 6
  • 39
  • 93
  • You're missing a ```"``` at the end of that json string and before the closing ```}```. – ewokx Jun 14 '22 at 05:34
  • is the content in the file placed as a string? I mean all content is enclosed within leading and trailing commas? – HamzaFarooq Jun 14 '22 at 05:35
  • The `"` at the end is there. I still keep getting the error. You can try with the sample data of 5 rows I provided at the end. Try to read it as a json with `json.loads()`, and you'll get the same error. In fact, try it with only the first 2 rows, you'll still get the error. But when trying to read each individual row as a JSON, the error doesn't occur. – Kristada673 Jun 14 '22 at 05:46
  • Multiple valid JSON strings concatenated together is not a valid JSON string. – gre_gor Jun 14 '22 at 05:58
  • Does this answer your question? [Python json.loads shows ValueError: Extra data](https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data) – gre_gor Jun 14 '22 at 05:58

1 Answers1

0

There's a couple of inconsistencies here. First, the first JSON string provided has some issues. The main one being trailing commas in a dict item, see this:
trailing comma's

The other issue is the leading and trailing single quotes, but that's easily fixed, and from what I can tell from the last list of values you provide, that's not an issue.

Please do note that that last list of JSON's is actually a valid file format, called an NDJSON file. An NDJSON is basically a list of JSON objects, separated by a newline character.

Of course python has a library or two for it.

To split the last list (which is actually an NDJSON file) into separate valid JSON files, you can use the logic below:

import ndjson
import json


with open("ndjson.ndjson") as infile:
    for index, js in enumerate(ndjson.load(infile)):
        with open(f"output_json_{index}.json", "w") as outfile:
            json.dump(js, outfile, indent=4)
Edo Akse
  • 4,051
  • 2
  • 10
  • 21