1

So I've been trying to analyze data that are presumably given in json format but the objects not separated by commas. Here is a sample from my data:

{
  "areaId": "Tracking001",
  "areaName": "Learning Theater Indoor",
  "color": "#99FFFF"
}
{
  "areaId": "Tracking001",
  "areaName": "Learning Theater Indoor",
  "color": "#33CC00"
}

There are thousands of them, so manually separating them is not possible. So here is my question: - Do I have to separate it comma and put the overarching key and make everything else as value in order to analyze it? I'm a beginner to data analysis, especially for json formatted data so any tips would be appreciated.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Sohbet
  • 23
  • 5
  • Does this answer your question? [How to extract multiple JSON objects from one file?](https://stackoverflow.com/questions/27907633/how-to-extract-multiple-json-objects-from-one-file) – ggorlen Sep 29 '22 at 18:19

1 Answers1

3

The raw_decode(s) method from json.JSONDecoder sounds like what you need. To quote from its doc string:

raw_decode(s): Decode a JSON document from s (a str beginning with a JSON document) and return a 2-tuple of the Python representation and the index in s where the document ended. This can be used to decode a JSON document from a string that may have extraneous data at the end.

Example usage:

import json

s = """{
  "areaId": "Tracking001",
  "areaName": "Learning Theater Indoor",
  "color": "#99FFFF"
}
{
  "areaId": "Tracking001",
  "areaName": "Learning Theater Indoor",
  "color": "#33CC00"
}"""
decoder = json.JSONDecoder()
v0, i = decoder.raw_decode(s)
v1, _ = decoder.raw_decode(s[i+1:]) # i+1 needed to skip line break

Now v0 and v1 hold the parsed json values.

You may want to use a loop if you have thousands of values:

import json

with open("some_file.txt", "r") as f:
    content = f.read()
parsed_values = []
decoder = json.JSONDecoder()
while content:
    value, new_start = decoder.raw_decode(content)
    content = content[new_start:].strip()
    # You can handle the value directly in this loop:
    print("Parsed:", value)
    # Or you can store it in a container and use it later:
    parsed_values.append(value)

Using this code for 1000 of above json values took about 0.03 seconds on my computer. However, it will become inefficient for larger files, because it always reads the complete file.

pschill
  • 5,055
  • 1
  • 21
  • 42
  • I see. Thx for your answer but since there are thousands of these objects in a json file (not stored as python string), I'm not sure whether applying raw_decode method would word and how to automatically search through every object without ever assigning them to variables like v0 and v1 – Sohbet Feb 14 '19 at 01:57
  • 1
    You could first read the file into a string, then create a loop that applies raw_decode, and then either directly handle the parsed value or store it in a container and handle all of them later. Should I update my answer with an example? – pschill Feb 14 '19 at 07:58
  • That would be really helpful if you could update it for reading a file which has thousands of json objects (just like shown in my code above) which are NOT separated by comma . I'm very new to this parsing json format and I've spent much time on clicking every link on Google to find out how to do so. So your help would be greatly appreciated.Thanks in advance – Sohbet Feb 14 '19 at 20:40
  • This fails if content has leading whitespace, but a simple `content.lstrip()` prior to the main loop sets things aright. `content[new_start:].strip()` can also be `lstrip()` although it's not a big deal either way. – ggorlen Oct 30 '22 at 23:34