0

I have a nested dictionary that looks like below,

{
 'product_list.show_date': "May '21",
  'product_list.new_users': 
   {
    'product_list.product': 
     {
      'A': None,
      'B': 377,
      'C': None,
      'D': 67,
      'E': None,
      'F': 1,
      'G': None
     }
    }
}

And I want to clear it out in a way that parent keys are not there. So basically, I want a dictionary that is not nested. Like below,

{
 'product_list.show_date': "May '21",
 'A': None,
 'B': 377,
 'C': None,
 'D': 67,
 'E': None,
 'F': 1,
 'G': None
}

I am using the recursive function to do this, but it's not 100% correct.

Here's my code,

def clear_nd(d, nested_dict):
    for key in nested_dict:
        if type(nested_dict[key]) != dict:
            d[key] = nested_dict[key]
        elif type(nested_dict[key]) == dict:
            nested_dict = nested_dict[key]
            clear_nd(d, nested_dict)
    
    return d

d = {}
clear_nd(d, nested_dict)

For below example,

nested_dict = {
    'product_list.show_date': "May '21",
    'product_list.new_users': {
        'product_list.product': {
            'A': None,
            'B': 377,
            'C': None,
            'D': 67,
            'E': None,
            'F': 1,
            'G': None
        },
        'prod.product': {
            'Alk': None,
            'Bay': 377,
            'Lent': None,
            'R': 67,
            'Ter': None,
            'Wi': 1,
            'e': None
        }
    },
    'duct_list.new_users': {
        'pdust.product': {
            'H': None,
            'y': 377,
            'nt': None,
            'C': 67,
            'sfer': None,
            's': 1,
            'le': None
        }
    }
}

Does Pandas or any other library has a way to do this. Structure of the nested dictionary is dynamic so we won't know how deep it is. And Keys will also change, so we won't able to know beforehand what are the keys in the dictionary. Any help will be appreciated. Thanks!!

terraCoder
  • 65
  • 8
  • Does it answer your question? https://stackoverflow.com/questions/52081545/python-3-flattening-nested-dictionaries-and-lists-within-dictionaries – soumya-kole Oct 21 '21 at 18:23

1 Answers1

3

If you allow the lower level tag labels to take prefixes of higher level tag labels, you can use the Pandas function pandas.json_normalize, which handles nested dict and turn it into a flat table Pandas dataframe.

Then, use pandas.DataFrame.to_dict to turn the Pandas dataframe to a dict. For example,

import pandas as pd

d = {
 'product_list.show_date': "May '21",
  'product_list.new_users': 
   {
    'product_list.product': 
     {
      'A': None,
      'B': 377,
      'C': None,
      'D': 67,
      'E': None,
      'F': 1,
      'G': None
     }
    }
}


pd.json_normalize(d).to_dict('records')[0]

Result:

{'product_list.show_date': "May '21",
 'product_list.new_users.product_list.product.A': None,
 'product_list.new_users.product_list.product.B': 377,
 'product_list.new_users.product_list.product.C': None,
 'product_list.new_users.product_list.product.D': 67,
 'product_list.new_users.product_list.product.E': None,
 'product_list.new_users.product_list.product.F': 1,
 'product_list.new_users.product_list.product.G': None}
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • Thanks, @SeaBean. It was a big help for me!! – terraCoder Oct 21 '21 at 18:53
  • @terraCoder Welcome! Pleased to help! :-) – SeaBean Oct 21 '21 at 18:54
  • @terraCoder Your recursive approach is a good idea too! However, recursive calls are inherently slow. You may get problem if the dataset size is getting large. One major complication is your nested dict is dynamic too! Hence, relying on the function of well known package like Pandas is probably one of the best ways to solve it. – SeaBean Oct 21 '21 at 19:02
  • @terraCoder With using the Pandas function, you can also leverage on its built-in features, such as controlling the max number of levels(depth of dict) to normalize. – SeaBean Oct 21 '21 at 19:11
  • @terraCoder After your considerations, come back and let us know what solution you finally take. This is an interesting and general topic, sure other members are interested to share your experiences. – SeaBean Oct 21 '21 at 19:50
  • 1
    Hi @SeaBean, I went with the in-built approach, it's much faster for larger datasets like you have said and clean as well. Thanks a lot !! – terraCoder Oct 21 '21 at 20:00
  • @terraCoder That's great you find a very good solution for it. Yup, it's right this approach run much faster, thanks to the optimization done by Pandas to use vectorized operations to provide parallel processing. – SeaBean Oct 21 '21 at 20:36