-2

I've been trying to use this code to read a huge JSON file (It contains 80+ million records) line by line:

import json
import pandas as pd


lines = []

with open('file_path','r') as f:
    for line in f:
            lines.append(json.loads(line))      
            df = pd.DataFrame(lines)

But this gives an error:

JSONDecodeError: Expecting property name enclosed in double quotes

Then, I used replace function with below code,

import json
import pandas as pd


lines = []
jstr = ""


with open('filepath','r') as f:
    for line in f:
            jstr = f'{jstr}{line}'
            jstr = line.replace("'", '"')
            lines.append(json.loads(jstr))
            df = pd.DataFrame(lines)

But I can only read first six rows and then I got this error:

JSONDecodeError: Expecting ',' delimiter

It is ensured that json is a valid format but I don't know what to do.

Would anyone help me how to handle this problem?

1 Answers1

0

Maybe are you searching this?

from pandas as pd

df = pd.read_json('data/simple.json')
Tau n Ro
  • 108
  • 8
  • It gives a MemoryError – user13619579 May 25 '22 at 14:33
  • I just find this: `df = pd.read_json("test.json", orient="records", lines=True, chunksize=5)` [source](https://datascience.stackexchange.com/questions/60268/load-large-jsons-file-into-pandas-dataframe) – Tau n Ro May 25 '22 at 14:44
  • It works but when I try to convert it to dataframe, data = pd.DataFrame(df), it gives ValueError: Expected object or value – user13619579 May 25 '22 at 15:05
  • @user13619579 could be this: Your JSON is malformed [source](https://stackoverflow.com/questions/44287011/valueerror-expected-object-or-value-when-reading-json-as-pandas-dataframe). You need to understand how the function [read_json](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html) works to be able to use. – Tau n Ro May 25 '22 at 15:28
  • I tried everything but it doesn't work – user13619579 May 25 '22 at 16:55