2

So I have to import a json file in an identical format shown below:

{
    "name": "bob"
}
{
    "name": "sarah"
}

This is the function I am trying to use to open it:

def read_json_file(file):
    with open(file, "r") as r:
        response = json.load(r)
        return response

I am getting this error when trying to load it:

json.decoder.JSONDecodeError: Extra data: line 4 column 1 (char 22)

There's no way for me to fix the json data as the file is quite large. I need a way to work around it to parse through each dictionary.

I have already tried the method when this question was asked:

Python json.loads shows ValueError: Extra data

I tried changing my function to match the top answer:

    response = json.dumps(r)

That however brought upon this error:

TypeError: Object of type TextIOWrapper is not JSON serializable

Any help would be appreciated on this.

python_help
  • 121
  • 1
  • 12
  • 1
    The example you've given is not valid JSON –  Aug 26 '21 at 16:05
  • @DarkKnight I know that but this is the format I've received it in and the file is too large to correct it so I need a workaround for it – python_help Aug 26 '21 at 18:26

2 Answers2

3

In order solving that kind of "multiple"/"invalid" JSON, you can read the entire file, add these brackets [] to encapsulate the string and then load it as string with json.loads().

  1. Read whole file as string, store it into a variable.
  2. Remove all occurrences of newlines and spaces.
  3. Add the comma , the intersection }{, so it will be ...},{....
  4. Encapsulate it with the brackets [].
  5. Use json.loads() to parse the JSON string.

Full code:

def read_json_file(file):
    with open(file, "r") as r:
        response = r.read()
        response = response.replace('\n', '')
        response = response.replace('}{', '},{')
        response = "[" + response + "]"
        return json.loads(response)
Dhana D.
  • 1,670
  • 3
  • 9
  • 33
  • This is the error I get when I insert this function `json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 3)` – python_help Aug 26 '21 at 18:24
  • Sorry, I did not watch clearly that each brackets are placed in a single line. Have edited my answer – Dhana D. Aug 26 '21 at 18:33
  • I'm not sure that replacing all spaces is a good idea. Consider {'name': 'John Smith'} –  Aug 26 '21 at 18:58
  • That works! I removed replacing spaces as that would not be the output I'm looking for but the rest of the function works – python_help Aug 26 '21 at 19:16
  • @DarkKnight Thanks for the suggestions. I have edited my answer. – Dhana D. Aug 27 '21 at 03:38
  • @python_help Thanks for the suggestions. I have edited my answer. Please also kindly upvote and accept my answer if you find it solves your problem/helpful. – Dhana D. Aug 27 '21 at 03:38
2

You can use JSONDecoder.raw_decode to incrementally consume the input. Here's an example based on the source of decode():

def json_decode_many(s):
  import json
  import json.decoder
  decoder = json.JSONDecoder()
  _w = json.decoder.WHITESPACE.match

  idx = 0

  while True:
    idx = _w(s, idx).end() # skip leading whitespace
    if idx >= len(s):
      break
    obj, idx = decoder.raw_decode(s, idx=idx)
    yield obj

Then usage looks like

>>> input_string = """
{
  "name": "bob"
}
{
  "name": "sarah"
}
"""
>>> for x in json_decode_many(input_string):
...   print("Decoded:", x)
...
Decoded: {'name': 'bob'}
Decoded: {'name': 'sarah'}
orip
  • 73,323
  • 21
  • 116
  • 148