I am aware that this question, in some form, has been asked numerous times. I have read through most, if not all, the pertinent questions but cannot seem to find the answer to my problem (or I have misunderstood the other answers). I will provide a list of all the related questions at the end.
The premise of the problem is simple: I have json that I wish to make into a pandas dataframe. There are many levels to this json, and it is at one of the deeper levels that I am running into issues.
I will first give some example data/schema and then detail some of my failed attempts.
The following data is a shortened version of the schema I will be working with, though I have retained the same number of levels.
json_data = {
'_source': {
'datetime': '',
'filters': {
'group': {
'filter_value': ''
}
},
'response': {
'option_value': {
'id': '',
'created': '',
'name': '',
'country': {
'name': ''
},
'currency': {
'symbol': ''
},
'toUSD': 0.0,
'partition': {
'items': [
{
'id': '',
'created': '',
'name': '',
'quoteNumber': '',
'scenario': {
'id': '',
'startDate': '',
'endDate': '',
'name': '',
'fragment': {
'items': [
{
'id': '',
'name': '',
'country': {
'name': ''
}
}
]
}
}
}
]
}
}
}
}
}
I am able to unpack the json down to the level of partition
and those keys listed underneath it using the following function:
pd.json_normalize(json_data,
record_path=['_source', 'response', 'option_value', 'partition', 'items'],
errors='ignore')
This correctly unpacks the json data with a single column that contains further json data called scenario.fragment.items
. It is this part of the json that I wish to unpack.
My first attempt was the following (I do not need to implement the meta
parameter in this instance):
pd.json_normalize(json_data,
record_path=['_source', 'response', 'option_value', 'partition', 'items', 'scenario', 'fragment', 'items'],
errors='ignore')
However, this gave me a {KeyError}'fragment
error message.
My suspicion was that it was due to the fact that this is a list of dicts and so the json_normalize
function was unable to correctly traverse it at this depth. Therefore, I extracted just the scenario.fragment.items
column from my first attempt and attempted to convert this Series
into json and then back again into a Dataframe:
series_data = pd.json_normalize(json_data,
record_path=['_source', 'response', 'option_value', 'partition', 'items'],
errors='ignore')['scenario.fragment.items']
series_data_as_json = series_data.to_json()
Unfortunately, at this point I am unable to convert the series_data_as_json
back into a Dataframe with pd.json_normalize
with the following error: {AttributeError}'str' object has no attribute 'values'
.
So my question is therefore how to unpack this deeply nested list of lists. Am I missing something with my first attempt at using pd.json_normalize
or should I pursue the Series
approach? Or is there another method entirely?
I have read, and attempted the solutions from, the following:
- Fast convert JSON column into Pandas dataframe
- Pandas Series to json and back
- Convert Nested JSON in to Dataframe using json_normalize
- How to flatten a nested JSON into a pandas dataframe
- Convert Pandas Series to Dictionary Without Index
- Convert list of dictionaries to a pandas DataFrame
- Converting JSON to pandas DataFrame- Python
- Convert JSON file to Pandas dataframe
- Convert JSON nested list of dict to DataFrame