I have a python script where I'm trying to read all .txt files in a directory and determine if they return True or False for any conditions that are in my script. I have thousands of .txt files with text in .json format. However, I'm getting an error message saying invalid .json format. I have checked that my text files are in .json format. I want the script to determine if the .txt file matches any of the statements in my code below. I then want to output the result to a csv file. Your help is very much appreciated! I have included my error messages and example .txt file.
Example .txt file with .json formattting
{
"domain_siblings": [
"try.wisebuygroup.com.au",
"www.wisebuygroup.com.au"
],
"resolutions": [
{
"ip_address": "34.238.73.135",
"last_resolved": "2018-04-22 17:59:05"
},
{
"ip_address": "52.0.100.49",
"last_resolved": "2018-06-24 17:05:06"
},
{
"ip_address": "52.204.226.220",
"last_resolved": "2018-04-22 17:59:06"
},
{
"ip_address": "52.22.224.230",
"last_resolved": "2018-06-24 17:05:06"
}
],
"response_code": 1,
"verbose_msg": "Domain found in dataset",
"whois": null
}
Error message
line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code
import os
import json
import csv
path=r'./output/'
csvpath='C:/Users/xxx/Documents/csvtest'
file_n = 'file.csv'
def vt_result_check(path):
vt_result = False
for filename in os.listdir(path):
with open(path + filename, 'r') as vt_result_file:
vt_data = json.load(vt_result_file)
# Look for any positive detected referrer samples
# Look for any positive detected communicating samples
# Look for any positive detected downloaded samples
# Look for any positive detected URLs
sample_types = ('detected_referrer_samples', 'detected_communicating_samples',
'detected_downloaded_samples', 'detected_urls')
vt_result |= any(sample['positives'] > 0 for sample_type in sample_types
for sample in vt_data.get(sample_type, []))
# Look for a Dr. Web category of known infection source
vt_result |= vt_data.get('Dr.Web category') == "known infection source"
# Look for a Forecepoint ThreatSeeker category of elevated exposure
# Look for a Forecepoint ThreatSeeker category of phishing and other frauds
# Look for a Forecepoint ThreatSeeker category of suspicious content
threats = ("elevated exposure", "phishing and other frauds", "suspicious content")
vt_result |= vt_data.get('Forcepoint ThreatSeeker category') in threats
return str(vt_result)
if __name__ == '__main__':
with open(file_n, 'w') as output:
for i in range(vt_result_file):
output.write(vt_result_file, vt_result_check(path))