I have a huge JSON
file (lots of smaller .log
(JSON
format) files combined together to a total of 8Gb), composed of multiple different objects (where every object takes a row). I want to read this file into a pandas dataframe
. I am only interested in collecting the JSON
entries for one specific object (this would drastically reduce the file size to read). Can this be done with pandas
or python
before reading in a dataframe
?
My current code is as follows:
import pandas as pd
import glob
df = pd.concat([pd.read_json(f, encoding = "ISO-8859-1", lines=True) for f in glob.glob("logs/sample1/*.log")], ignore_index=True)
As you might imagine, this is very computationally heavy, and takes a lot of time to complete. Is there a way to process this before reading it in a dataframe
?
Sample of Data:
{"Name": "1","variable": "value","X": {"nested_var": 5000,"nested_var2": 2000}}
{"Name": "2","variable": "value","X": {"nested_var": 1222,"nested_var2": 8465}}
{"Name": "2","variable": "value","X": {"nested_var": 123,"nested_var2": 865}}
{"Name": "1","variable": "value","X": {"nested_var": 5500,"nested_var2": 2070}}
{"Name": "2","variable": "value","X": {"nested_var": 985,"nested_var2": 85}}
{"Name": "2","variable": "value","X": {"nested_var": 45,"nested_var2": 77}}
I want to only read instances where name = 1