0

I have a series of .json files. Each file contains tweets based on a different keyword. Each line in every file is a json object. I read the files using the following code:

# Get tweets out of JSON file
tweetsFromJSON = []
with open(json_file) as f:
    for line in f:
        json_object = json.loads(line)
        tweet_text = json_object["text"]
        tweetsFromJSON.append(tweet_text)

For every JSON file I have this works flawlessly. But this particular file gives me the following error:

Traceback (most recent call last):
  File "C:/Users/alexandros/Dropbox/Development/Sentiment Analysis/lda_analysis.py", line 119, in <module>
    lda_analysis('precision_medicine.json', 'precision medicine')
  File "C:/Users/alexandros/Dropbox/Development/Sentiment Analysis/lda_analysis.py", line 46, in lda_analysis
    json_object = json.loads(line)
  File "C:\Users\alexandros\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\Users\alexandros\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5287 (char 5286)

So tried removing the first line to see what happens. The error persists and again it's in the exact same position (line 1 column 5287 (char 5286)). I removed another line and it's the same. I'm breaking my head trying to figure out what's wrong. What am I missing?

Aventinus
  • 1,322
  • 2
  • 15
  • 33
  • paste your json file in an online json checker. If its too large, just paste include the last part of the file. It should tell you whats wrong. – Paul Rooney Dec 05 '16 at 10:54
  • Ok, it gets weirder now. According to the online checker the JSON is valid. – Aventinus Dec 05 '16 at 11:01
  • Maybe is some special character that the decoder fail to parse. – k4ppa Dec 05 '16 at 11:09
  • It works fine for me. [Code](http://ideone.com/HQNHc5) `lines = [line for line in open('personalized_medicine.json')]` and `js = [json.loads(line) for line in lines]`. Other times that error is seen the issue has been that there are more than one json document being passed to `json.load(s)` at a time. e.g. `{}{}` is 2 documents in one. whereas `[{},{}]` is legal. example [here](http://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data) – Paul Rooney Dec 05 '16 at 11:16

0 Answers0