0

I am trying to read a file which has the contents in the below format:

{"op":"mcm","clk":"6394474220","pt":1523095339090,"mc":[{"id":"1.141299528","rc":[{"atb":[[10,5.56]],"id":30246}],"con":true,"img":false}]}
{"op":"mcm","clk":"6394627886","pt":1523096762118,"mc":[{"id":"1.141299528","rc":[{"atb":[[10.5,20.78]],"id":30246}],"con":true,"img":false}]}
{"op":"mcm","clk":"6394647672","pt":1523096790720,"mc":[{"id":"1.141299528","rc":[{"atb":[[10,22.23]],"id":30246}],"con":true,"img":false}]}

I am trying to read it as json but it seems these are multiple jsons in one file. When I trying to read this json file using:

connection_file = open(filepath, 'r')
conn_string = json.load(connection_file)

It gives an error:

json.decoder.JSONDecodeError: Extra data: line 2 column 1

Please let me know how to read such files.

Mike Scotty
  • 10,530
  • 5
  • 38
  • 50
star_kid
  • 181
  • 1
  • 12
  • Take your JSON and paste it into https://jsonlint.com/ and click "Validate json". It will tell you whether your JSON is valid – Daniel Jul 30 '18 at 09:35
  • It seems you have a file where each line is a JSON string, but the file as a whole is not valid JSON. – Mike Scotty Jul 30 '18 at 09:36
  • 1
    Following on from @Mike's observation that means you should loop over the file line by line and use `json.loads` on each line instead of `json.load` for the entire lot. – Jon Clements Jul 30 '18 at 09:36
  • It says: Error: Parse error on line 16: ... "img": false }]} { "op": "mcm", "cl ---------------------^ Expecting 'EOF', '}', ',', ']', got '{' But how to fix this? – star_kid Jul 30 '18 at 09:37
  • For the sake of completion: you need to make sure / agree with whoever produces this JSON that the delimiter between JSON "documents" in the input will be a new line. That is, that new lines are not allowed within a single JSON document as they usually would be. – millimoose Jul 30 '18 at 09:39
  • Possible duplicate of [multiple Json objects in one file extract by python](https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python) – Mike Scotty Jul 30 '18 at 09:39
  • (While it is possible to hypothetically load a stream of JSON documents that are prettyprinted, most parsing libraries aren't built to handle that and won't make your life easier.) – millimoose Jul 30 '18 at 09:39
  • @MikeScotty - nice catch, I've never heard of raw_decode being able to do that, might come in handy if my job search goes well – millimoose Jul 30 '18 at 09:40
  • @millimoose it looks to follow the [jsonlines](http://jsonlines.org/) specification (or [ndjson](http://ndjson.org/)). I have seen it used a few times for repeatable output formats e.g. timed lines in log files. – roganjosh Jul 30 '18 at 09:41
  • @roganjosh I mean… it’s nice that it has a name, great that it has a spec, but there seems to be a lack when it comes to general purpose parsing libraries – millimoose Jul 30 '18 at 10:35

2 Answers2

1

Your file isn't a valid json file. Each line is a json but they are delimited by newlines. You can use this to get them all as a list:

with open(filepath) as f:
    jsons = list(map(json.loads, f))
# jsons is now a list of all jsons in your file.
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
1

Looks like file is not a proper json file, but contains json data in each line. So read the file line by line and convert it to json

>>> import json
>>> with open('tmp.txt') as f:
...     json_list = [json.loads(line) for line in f]
... 
>>> json_list
[{'op': 'mcm', 'clk': '6394474220', 'pt': 1523095339090, 'mc': [{'id': '1.141299528', 'rc': [{'atb': [[10, 5.56]], 'id': 30246}], 'con': True, 'img': False}]}, {'op': 'mcm', 'clk': '6394627886', 'pt': 1523096762118, 'mc': [{'id': '1.141299528', 'rc': [{'atb': [[10.5, 20.78]], 'id': 30246}], 'con': True, 'img': False}]}, {'op': 'mcm', 'clk': '6394647672', 'pt': 1523096790720, 'mc': [{'id': '1.141299528', 'rc': [{'atb': [[10, 22.23]], 'id': 30246}], 'con': True, 'img': False}]}]
Sunitha
  • 11,777
  • 2
  • 20
  • 23