0

I currently have JSON available like this. There are 800 objects stored in one file called test.json but the format isn't valid. I have showed 2 objects out of 800 below:

{
    "_id": {
        "$oid": "592638e163690a5c1f8f73e2"
    },
    "title": "simplifying fractions",
    "url": "some_url",
    "difficulty": "easy",
    "webview": "",
    "id": 0
} {
    "_id": {
        "$oid": "592638e163690a5c1f8f73f5"
    },
    "title": "patterns overlap",
    "url": "some_url",
    "difficulty": "hard",
    "webview": "",
    "id": 1
}

When I run the above two json objects through jsonlint.com I am getting an error at line 10 saying there is a parse error. I want to convert it into something like this which is working in jsonlint.com:

{
    "0": {
        "_id": {
            "$oid": "592638e163690a5c1f8f73e2"
        },
        "title": "simplifying fractions",
        "url": "some_url",
        "difficulty": "easy",
        "webview": "",
        "id": 0
    },
    "1": {
        "_id": {
            "$oid": "592638e163690a5c1f8f73f5"
        },
        "title": "patterns overlap",
        "url": "some_url",
        "difficulty": "hard",
        "webview": "",
        "id": 1
    }
}

Now in the above version it passes the lint. In the first version I simply have 800 JSON objects and I want to convert it into the version above where we have one big dictionary at the top and then a key like "0", "1" followed by the JSON object. I am not sure how to start in creating the python script. Can someone give me a hint or some starting code on how I can parse the very first invalid JSON code?

  • Possible duplicate of https://stackoverflow.com/questions/18514910/how-do-i-automatically-fix-an-invalid-json-string – Antimony May 25 '17 at 22:41

2 Answers2

0

I suggest to turn your JSON into a list of objects instead:

[ {}, {}, ... {} ]

You could do that by replacing the leading { with [{, trailing } with }], and every occurrence of } { with }, { using either search replace in your editor, a custom script, or a command-line tool like sed.

The resulting file would look like:

[{
    "_id": {
        "$oid": "592638e163690a5c1f8f73e2"
    },
    ...
}, {
    "_id": {
        "$oid": "592638e163690a5c1f8f73f5"
    },
    ...
}

This should parse as valid JSON and be converted into a list of Python dicts.

Adam Byrtek
  • 12,011
  • 2
  • 32
  • 32
0

Well, it's not valid JSON, so don't parse it as JSON. Just find the pieces to use in your index... which is done simply by counting the braces.

f = open("infile")
oldjson = f.read();
newjson = "{\n"
newjson_element = ""
key = 0
open_brace_counter = 0
for character in oldjson:
    newjson_element += character
    if character == "{":
        open_brace_counter += 1
    if character == "}":
        open_brace_counter -= 1
        if open_brace_counter == 0:
            newjson = newjson + '"' + str(key) + '": ' + newjson_element +'\n'
            newjson_element = ""
            key += 1
        if open_brace_counter < 0:
            print "more closing braces than opened braces - some extra problem?"
if newjson_element.strip() != "":
    print "some characters after the last full element - some extra problem?"
newjson += "}\n"
outfile = open("outfile.json","w")
outfile.write(newjson)
outfile.close()

This is a quick script so it will produce invalid JSON if there are any non-whitespace characters between the final } of an element and the opening { of the next one.

Mikhail Ramendik
  • 1,063
  • 2
  • 12
  • 26