0

I have a problem reading big json file. JSONDecodeError: Extra data: line 1 column 884 (char 883). enter image description here

The files test2.json is here: https://github.com/SilverYar/TransportDataMiner

The error is due to these lines of line code:

import nltk
from nltk.stem.snowball import RussianStemmer
from nltk.corpus import stopwords
import nltk, string, json

with open('C:\\Creme\\token\\test2.json') as fin:
    text = json.load(fin)

I don’t understand how to fix it. Help me fix it.

Yaroslav
  • 87
  • 2
  • 10
  • Possible duplicate of [multiple Json objects in one file extract by python](https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python) – glibdud Oct 21 '19 at 13:43
  • Error reading large json file. The answers did not help me. – Yaroslav Oct 21 '19 at 13:59
  • 1
    The size of the file is not relevant here. The file is not valid JSON. It's a bunch of JSON objects concatenated together. See the dupe target for some ways to deal with it. – glibdud Oct 21 '19 at 14:00

1 Answers1

2

The content of your json file does not seem to be valid, there are multiple objects but not separated by ",".

For example, a valid json object should be:

[{"title":"some text", "subtitle": "some text"},
 {"title":"some text", "subtitle": "some text"},
{"title":"some text", "subtitle": "some text"}]

A simple hack to read it will be to read in the file and format the string into correct json formats:

with open('test2.json', 'r') as fin:
    text = fin.read()
    formated_text = text.replace('}{', '},{')
    json_data = json.loads(f'[{formated_text}]')

print(len(json_data))
# 11772
Merelda
  • 1,318
  • 2
  • 12
  • 26