3

I am reading a txt file which has JSON objects where the objects are not separated by commas. I would like to add commas between the json objects and place them all into a JSON list or Array.

I have tried JSON.loads but I am getting the JSON Decode error. So I realized i am supposed to put commas in between the different objects present in the .txt file

Below is the example of the file content in .txt

{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
}{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}

''''''''''''''''''''''''''''''''''''

Expected Result:

''''''''''''''''''''''''''''''''''''

[
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
        "ftail": "\n",
        "ftext": "Sanjeev Saxena"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "607-619"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1996"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "33"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "number": {
        "ftail": "\n",
        "ftext": "7"
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
},
{
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
        "ftail": "\n",
        "ftext": "Hans-Ulrich Simon"
    },
    "title": {
        "ftail": "\n",
        "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
        "ftail": "\n",
        "ftext": "227-248"
    },
    "year": {
        "ftail": "\n",
        "ftext": "1983"
    },
    "volume": {
        "ftail": "\n",
        "ftext": "20"
    },
    "journal": {
        "ftail": "\n",
        "ftext": "Acta Inf."
    },
    "url": {
        "ftail": "\n",
        "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
        "ftail": "\n",
        "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
}
]

''''''''''''''''''''

Meghashyam
  • 53
  • 2
  • 6
  • Possible duplicate of https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python/50384432 – Dunes Apr 08 '19 at 04:43
  • Does this answer your question? [how to analyze json objects that are NOT separated by comma (preferably in Python)](https://stackoverflow.com/questions/54663739/how-to-analyze-json-objects-that-are-not-separated-by-comma-preferably-in-pytho) – ggorlen Oct 30 '22 at 23:03

2 Answers2

2

you can add comma between objects with reqexp:

import re

with open('name.txt', 'r') as input, open('out.txt', 'w') as output:
    output.write("[\n")
    for line in input:
        line = re.sub('}{', '},{', line)
        output.write('    '+line)
    output.write("]\n")
harnusek
  • 58
  • 4
  • 1
    Using regex to alter hierarchical/nested structures is a [very, very bad idea](https://stackoverflow.com/a/1732454/7553525) - you could unitentionally alter deeper nested structures or values this way (i.e. `"ftext": "I contain {this}{that}"` would acquire an additional comma). Furthermore, if you want to do simple string replacement, regex is an overkill - [`str.replace()`](https://docs.python.org/3/library/stdtypes.html#str.replace) would do the job just fine. – zwer Apr 08 '19 at 04:23
0

If you can always guarantee that your JSON will be formatted as in your example, i.e. new JSON object begins on the same line where the last one ends and there is no indent, you can get by just by reading your JSON into a buffer until you encounter such line and then sending the buffer for JSON parsing - rinse & repeat:

import json

parsed = []  # a list to hold individually parsed JSON objects
with open('path/to/your.json') as f:
    buffer = ''
    for line in f:
        if line[0] == '}':  # end of the current JSON object
            parsed.append(json.loads(buffer + '}'))
            buffer = line[1:]
        else:
            buffer += line

print(json.dumps(parsed, indent=2))  # just to make sure it all went well

Which would yield:

[
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Saxena96",
    "author": {
      "ftail": "\n",
      "ftext": "Sanjeev Saxena"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Parallel Integer Sorting and Simulation Amongst CRCW Models."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "607-619"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1996"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "33"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "number": {
      "ftail": "\n",
      "ftext": "7"
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta33.htmlfSaxena96"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF03036466"
    },
    "ftail": "\n",
    "ftext": "\n"
  },
  {
    "@mdate": "2011-01-11",
    "@key": "journals/acta/Simon83",
    "author": {
      "ftail": "\n",
      "ftext": "Hans-Ulrich Simon"
    },
    "title": {
      "ftail": "\n",
      "ftext": "Pattern Matching in Trees and Nets."
    },
    "pages": {
      "ftail": "\n",
      "ftext": "227-248"
    },
    "year": {
      "ftail": "\n",
      "ftext": "1983"
    },
    "volume": {
      "ftail": "\n",
      "ftext": "20"
    },
    "journal": {
      "ftail": "\n",
      "ftext": "Acta Inf."
    },
    "url": {
      "ftail": "\n",
      "ftext": "db/journals/acta/acta20.htmlfSimon83"
    },
    "ee": {
      "ftail": "\n",
      "ftext": "http://dx.doi.org/10.1007/BF01257084"
    },
    "ftail": "\n",
    "ftext": "\n"
  }
]

If your case is not as clear cut (i.e. you can't predict the formatting) you can try out some of the iterative/event-based JSON parsers (ijson for example) which would be able to tell you once a 'root' object is closed so that you can 'split' the parsed JSON objects into a sequence.

UPDATE: On a second thought, you don't need anything apart from the built-in json module even if your concatenated JSONs are not properly or indented at all - you can use json.JSONDecoder.raw_decode() (and its undocumented second parameter) to traverse your data and look for valid JSON structures in an iterative manner until you've traversed your whole file (or encountered an error). For example:

import json

parser = json.JSONDecoder()
parsed = []  # a list to hold individually parsed JSON structures
with open('test.json') as f:
    data = f.read()
head = 0  # hold the current position as we parse
while True:
    head = (data.find('{', head) + 1 or data.find('[', head) + 1) - 1
    try:
        struct, head = parser.raw_decode(data, head)
        parsed.append(struct)
    except (ValueError, json.JSONDecodeError):  # no more valid JSON structures
        break

print(json.dumps(parsed, indent=2))  # make sure it all went well

Should give you the same result as above but this time won't depend on } being the first character of a new line whenever your JSON object 'closes'. It should also work for JSON arrays stacked back-to-back.

zwer
  • 24,943
  • 3
  • 48
  • 66
  • Thank you so much for the help. But the JSON i receive is not well formatted like the one that i have provided. How do I use ijson to split json based on the root nodes ? – Meghashyam Apr 07 '19 at 21:34