1

I'm trying to create a program that can identify which delimiter characters are missing and insert them in their respective position within a JSON file. So, for instance, let's suppose my JSON looks like this:

{
    "array": [{
        "id": "123",
        "info": {
            "name": "something"
        }
        "address": {
            "street": "Dreamland"
        }
    }]
}

This JSON is invalid since there's no , between } and "address".

Here's the thing: when I try to use json.loads(my_json.read()), it'll throw a JSONDecodeError exception, which will tell me what character is missing and at what character position it should be, like so:

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 7 column 9 (char 116)

I've already tried printing my_json.seek(116) which didn't work at all, it just printed the number 116.

So my question is: is it possible to actually use the exception error message to fix the JSON since it shows what delimiter character is missing and at what position it should be, or should I just use a stack data structure to keep track of all possible delimiters and insert the missing ones? Or is there a better way to do this?

I've seen other questions in here like this one and this one but the first one is an encoding issue, which is not my case, and the second one is for a very specific case.

Sorry if some of this seems dumb, I'm still learning how to code in Python.

Thanks for your time :)

Daniel Bertoldi
  • 359
  • 4
  • 14

1 Answers1

2

You can slice the string at the position where the delimiter ',' is missing (stored as the pos attribute of exception object), and join them with ',':

import json

s = '[{"a":1}{"b":2}{"c":3}]'
while True:
    try:
        data = json.loads(s)
        break
    except json.decoder.JSONDecodeError as e:
        if not e.args[0].startswith("Expecting ',' delimiter:"):
            raise
        s = ','.join((s[:e.pos], s[e.pos:]))
print(s)
print(data)

This outputs:

[{"a":1},{"b":2},{"c":3}]
[{'a': 1}, {'b': 2}, {'c': 3}]
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • 2
    You can also use the `colno`, `lineno`, and/or `pos` attributes of the `JSONDecodeError` to save on some regex-ing. To make it even more simple, you can also pre-minify it to get rid of the whitespace, then you can just use `pos` to find the insertion point: `minified = ''.join(x.strip() for x in s.splitlines())` – b_c Oct 28 '19 at 21:38
  • 2
    Updated to use the `pos` attribute as suggested. Thanks! Don't think minification makes any difference though--the `pos` attribute is just as accurate even with redundant white spaces. – blhsing Oct 28 '19 at 21:48
  • Good point! For some reason I thought `pos` and `colno` were the same. – b_c Oct 29 '19 at 14:00