0

I need your help with converting a multidimensional dict to a pandas data frame. I get the dict from a JSON file which I retrieve from a API call (Shopify).

response = requests.get("URL", auth=("ID","KEY")) 
data = json.loads(response.text)

The "data" dictionary looks as follows:

 {'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
             'email': 'test@aol.com',
             'id': 4314127108,
             'line_items': [{'destination_location': 
                                       {'address1': 'Teststreet 12',
                                        'address2': '',
                                        'city': 'Berlin',
                                        'country_code': 'DE',
                                        'id': 2383331012,
                                        'name': 'Test Test',
                                        'zip': '10117'}, 
                             'gift_card': False,
                             'name': 'Blueberry Cup'},
                             {'destination_location': 
                                       {'address1': 'Teststreet 12',
                                        'address2': '',
                                        'city': 'Berlin',
                                        'country_code': 'DE',
                                        'id': 2383331012,
                                        'name': 'Test Test',
                                        'zip': '10117'}, 
                             'gift_card': False,
                             'name': 'Strawberry Cup'}]
}]}

In this case the dictionary has 4 Dimensions and I would like to convert the dict into a pandas data frame. I tried everything ranging from json_normalize() to pandas.DataFrame.from_dict(), yet I did not manage to get anywhere. When I try to convert the dict to a df, I get columns which contain list of lists.

My goal is to have an individual row per product. Thanks!

Desired Output:

Created at  Email           id          Name
9/20/2016   test@test.de    4314127108  Blueberry Cup
9/20/2016   test@test.de    4314127108  Strawberry Cup

1 Answers1

2

I don't really understand how json_normalize() fails this so hard, I have similar data with twice the nesting depth and json_normalize() still manages to give me a much better result.

I wrote this recursive function to replace the lists in your example with dictionaries:

def removeList(D):
  for k in D.keys():
    if isinstance(D[k],list):
      T = {}
      for i in range(len(D[k])):
        T[str(i)] = D[k][i]
      D[k] = removeList(T)
      return D
    elif isinstance(D[k],dict):
      D[k] = removeList(D[k])
      return D
    else:
      return D

json_normalize() can work with the outcome of that a bit better at least.

However I recommend doing it manually even if it's annoying. You can create your own dictionary with your own desired structure, write all data manually into it and then convert that into your dataframe. It's a good way to check the data for consistency and do all flattening, preprocessing and normalizing you need.

Since I have data that's structured similar to yours, I'm using a two-step process. In the first step I create a flattened dictionary that consists of no other dictionaries but still has a list in one key (that would be line_items in your case). Each list entry is also flattened into a simple dictionary. Then I create a second dataframe from that list of dictionaries like this:

ListDF = pd.DataFrame.from_dict([iFr for sl in DF["List"] for iFr in sl])

Since I did all the normalizing manually I was able to add necessary keys to the list items so now I can just merge the two dataframes into my final dataframe using those keys. Then I drop the List column and my final data structure is complete, I went from a horribly nested dictionary to a simple relational scheme that can be easily worked with.

I figure this would also work best for you.

Khris
  • 3,132
  • 3
  • 34
  • 54
  • Thank you for your help. your function works. however I am not 100% following your second suggestion. see expected result above! – Alexander Eser Sep 22 '16 at 13:48
  • You have a dictionary 4 levels deep. What you want is something that is only 1 level deep so it fits the relational data model. So you need to flatten the structure, put everything, every key on the same level. I'm saying that this is best done manually because you have full control over how your data looks in the end and you can do preprocessing. – Khris Sep 22 '16 at 14:24