I have a JSON file that I want to convert into a DataFrame object in Python. I found a way to do the conversion but unfortunately it takes ages, and thus I'm asking if there are more efficient and elegant ways to do the conversion.
I use json library to open the JSON file as a dictionary which works fine:
import json
with open('path/file.json') as d:
file = json.load(d)
Here's some mock data that mimics the structure of the real data set:
dict1 = {'first_level':[{'A': 'abc',
'B': 123,
'C': [{'D' :[{'E': 'zyx'}]}]},
{'A': 'bcd',
'B': 234,
'C': [{'D' :[{'E': 'yxw'}]}]},
{'A': 'cde',
'B': 345},
{'A': 'def',
'B': 456,
'C': [{'D' :[{'E': 'xwv'}]}]}]}
Then I create an empty DataFrame and append the data that I'm interested in to it with a for loop:
df = pd.DataFrame(columns = ['A', 'B', 'C'])
for i in range(len(dict1['first_level'])):
try:
data = {'A': dict1['first_level'][i]['A'],
'B': dict1['first_level'][i]['B'],
'C': dict1['first_level'][i]['C'][0]['D'][0]['E']}
df = df.append(data, ignore_index = True)
except KeyError:
data = {'A': dict1['first_level'][i]['A'],
'B': dict1['first_level'][i]['B']}
df = df.append(data, ignore_index = True)
Is there a way to get the data straight from the JSON more efficiently or can I write the for loop more elegantly?
(Running through the dataset(~150k elements) takes over an hour. I'm using Python 3.6.3 64bits)