0

I have big size of json file to parse with python, however it's incomplete (i.e., missing parentheses in the end). The json file consist of one big json object which contains json objects inside. All json object in outer json object is complete, just finishing parenthese are missing. for example, its structure is like this.

{bigger_json_head:value, another_key:[{small_complete_json1},{small_complete_json2}, ...,{small_complete_json_n}, 

So final "]}" are missing. however, each small json forms a single row so when I tried to print each line of the json file I have, I get each json object as a single string.

so I've tried to use:

with open("file.json","r",encoding="UTF-8") as f: 
    for line in f.readlines()
    line_arr.append(line)

I expected to have a list with line of json object as its element

and then I tried below after the process:

for json_line in line_arr:
    try:
       json_str = json.loads(json_line)
       print(json_str)
    except json.decoder.JSONDecodeError:
       continue

I expected from this code block, except first and last string, this code would print json string to console. However, it printed nothing and just got decode error.

Is there anyone who solved similar problem? please help. Thank you

quamrana
  • 37,849
  • 12
  • 53
  • 71
SJ K
  • 15
  • 4
  • Does this answer your question? [How do I automatically fix an invalid JSON string?](https://stackoverflow.com/questions/18514910/how-do-i-automatically-fix-an-invalid-json-string) – CodeIt Apr 05 '20 at 13:57
  • @CodeIt Thanks I will try that! – SJ K Apr 05 '20 at 14:44

1 Answers1

1

If the faulty json file only miss the final "]}", then you can actually fix it before parse it. Here is an example code to illustrate:

with open("file.json","r",encoding="UTF-8") as f:
  faulty_json_str = f.read()
  fixed_json_str = faulty_json_str + ']}'
  json_obj = json.loads(fixed_json_str)
Yosua
  • 411
  • 3
  • 7
  • 1
    and the ']}' part shouldn't be hard coded as eventually I want to write a code that fixes faulty json and parse it properly – SJ K Apr 05 '20 at 14:44