I have 1500 json files like:
[
{
"info1": {
"name": "John",
"age" : "50"
"country": "USA",
},
"info2": {
"id1": "129",
"id2": "151",
"id3": "196",
},
"region": [
{
"id": "36",
"name": "Spook",
"spot": "2"
},
{
"id": "11",
"name": "Ghoul",
"spot": "6"
},
{
"id": "95",
"lat": "Devil",
"spot": "4"
}
]
}
{
"info1": {
"name": "Mark",
"age" : "33"
"country": "Brasil",
},
"info2": {
"id1": "612",
"id2": "221",
"id3": "850",
},
"region": [
{
"id": "68",
"name": "Ghost",
"spot": "7"
},
{
"id": "75",
"name": "Spectrum",
"spot": "2"
},
{
"id": "53",
"name": "Phantom",
"spot": "2"
}
]
}
]
I have loaded important information from json files into the data frame and I added column with json file name. My code:
path_to_json = 'my files_directory'
json_files = glob.glob(os.path.join(path_to_json, "*.json"))
for file_ in json_files:
df = pd.read_json(file_)
df = df.drop(columns=['info1', 'info2']) # these columns is not important to me so I delete it
df2 = pd.DataFrame(columns=['name', 'date'])
names=[]
dates=[]
for x in df['region']:
for name in x:
names.append(name['name'])
dates.append(file_)
df2['name']=names
df2['date']=dates
My data frame look like this:
name date
0 Spook 20191111.json
1 Ghoul 20191111.json
2 Devil 20191111.json
3 Ghost 20191111.json
4 Spectrum 20191111.json
5 Phantom 20191111.json
And this output it's satisfying for me, but when I have 1500 json files in the folder, loading this into data frame takes ages. This probably results from using the append() function. How can I modify this code to speed up loading this json files?
Thank you in advance for your help.