Python: unpack deeply nested list of dicts from json into pandas

Question

I am aware that this question, in some form, has been asked numerous times. I have read through most, if not all, the pertinent questions but cannot seem to find the answer to my problem (or I have misunderstood the other answers). I will provide a list of all the related questions at the end.

The premise of the problem is simple: I have json that I wish to make into a pandas dataframe. There are many levels to this json, and it is at one of the deeper levels that I am running into issues.

I will first give some example data/schema and then detail some of my failed attempts.

The following data is a shortened version of the schema I will be working with, though I have retained the same number of levels.

json_data = {
  '_source': {
    'datetime': '',
    'filters': {
      'group': {
        'filter_value': ''
      }
    },
    'response': {
      'option_value': {
        'id': '',
        'created': '',
        'name': '',
        'country': {
          'name': ''
        },
        'currency': {
          'symbol': ''
        },
        'toUSD': 0.0,
        'partition': {
          'items': [
            {
              'id': '',
              'created': '',
              'name': '',
              'quoteNumber': '',
              'scenario': {
                'id': '',
                'startDate': '',
                'endDate': '',
                'name': '',
                'fragment': {
                  'items': [
                    {
                      'id': '',
                      'name': '',
                      'country': {
                        'name': ''
                      }
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }
  }
}

I am able to unpack the json down to the level of partition and those keys listed underneath it using the following function:

pd.json_normalize(json_data, 
                  record_path=['_source', 'response', 'option_value', 'partition', 'items'], 
                  errors='ignore')

This correctly unpacks the json data with a single column that contains further json data called scenario.fragment.items. It is this part of the json that I wish to unpack.

My first attempt was the following (I do not need to implement the meta parameter in this instance):

pd.json_normalize(json_data,
                  record_path=['_source', 'response', 'option_value', 'partition', 'items', 'scenario', 'fragment', 'items'],
                  errors='ignore')

However, this gave me a {KeyError}'fragment error message.

My suspicion was that it was due to the fact that this is a list of dicts and so the json_normalize function was unable to correctly traverse it at this depth. Therefore, I extracted just the scenario.fragment.items column from my first attempt and attempted to convert this Series into json and then back again into a Dataframe:

series_data = pd.json_normalize(json_data, 
                  record_path=['_source', 'response', 'option_value', 'partition', 'items'], 
                  errors='ignore')['scenario.fragment.items']

series_data_as_json = series_data.to_json()

Unfortunately, at this point I am unable to convert the series_data_as_json back into a Dataframe with pd.json_normalize with the following error: {AttributeError}'str' object has no attribute 'values'.

So my question is therefore how to unpack this deeply nested list of lists. Am I missing something with my first attempt at using pd.json_normalize or should I pursue the Series approach? Or is there another method entirely?

I have read, and attempted the solutions from, the following:

There's not a sufficient amount of data here to help you; `this gave me a {KeyError}'fragment error message.` can't be reproduced. — Trenton McKinney, Aug 12 '21 at 16:11
Hi @TrentonMcKinney, are you saying that using the `json_data` object within the `pd.json_normalize` function did not give you the same error? If so that is interesting and indicates that perhaps the issue lies elsewhere. — FitzKaos, Aug 13 '21 at 09:01
That's correct. With the snippet shown, the `'items'` key is unpacked without an error. However, this `json` snippet is not representative of the entire `json`. — Trenton McKinney, Aug 13 '21 at 16:57

Python: unpack deeply nested list of dicts from json into pandas

0 Answers0