1

How can I load data from JSON file with raw long text data?

I have big file with news:

[body] is a news data that I need to analyze.

I tried to read it like this:

with open('file.json', 'r') as openfile: 
  
    # Reading from json file 
    dfnew = json.load(openfile) 
openfile.close

But I get an error:

Extra data: line 2 column 1 (char 1938)

Maybe you know the better way, how can I save it, to easily read?

I created the file from dataframe by using this code:

df.to_json('file.json', orient='records', lines=True)
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    Does this answer your question? [Python json.loads shows ValueError: Extra data](https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data) – jonrsharpe Jan 25 '21 at 08:56
  • You are having data in the *JSON Lines* format. You just have to decode every line on its own. – Klaus D. Jan 25 '21 at 09:18

1 Answers1

1

Your data seems to be in the Newline Delimited JSON format. Instead of trying to parse the whole file, you can parse the individual lines with json.loads. Also you don't need to close files if you are using a with statement.

import json

with open('file.json') as openfile:
    for line in openfile:
        dfnew = json.loads(line)
        print(dfnew)
Leo
  • 1,273
  • 9
  • 14