0
Dict = {'Things' : {'Car':'Lambo', 'Home':'NatureVilla', 'Gadgets':{'Laptop':{'Programs':{'Data':'Excel', 'Officework': 'Word', 'Coding':{'Python':'PyCharm', 'Java':'Eclipse', 'Others': 'SublimeText'}, 'Wearables': 'SamsungGear', 'Smartphone': 'Nexus'}, 'clothes': 'ArmaaniSuit', 'Bags':'TravelBags'}}}}



d = {(i,j,k,l,m,n): Dict[i][j][k][l][m][n]
     for i in Dict.keys()
     for j in Dict[i].keys()
     for k in Dict[j].keys()
     for l in Dict[k].keys()
     for m in Dict[l].keys()
     for n in Dict[n].keys()
     }

mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
print (df)

What I have already done: I tried to Multiindex this Irregular Data using pandas but I am getting KeyError at 'Car'. Then I tried to handle exceptions and tried to PASS it but then it results in a Syntax Error. So May be I lost the direction. If there is any other module or way I can index this irregular data and put it in a table somehow. I have a chunk of raw data like this.

What I am trying to do: I wanted to use this data for printing in QTableView which is from PyQt5 (Making a program with GUI).

Conditions: This Data keeps on updating every hour from an API.

What I have thought till now: May be I can append all this data to MySQL. But then when this data updates from API, only Values will change, rest of the KEYS will be the same. But then It will require more space.

References: How to convert a 3-level dictionary to a desired format?

How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists

Any Help will be appreciated. Thanks for reading the question.

Pratik
  • 73
  • 1
  • 2
  • 9

2 Answers2

0

You information looks a lot like json and that's what the API is returning. If that's the case, and you are turning it into a dictionary, then you might me better off using python's json library or even panda's built it read_json format.

Pandas read json

Python's json

randyjp
  • 1,084
  • 9
  • 8
  • 1
    Thanks for the answer. I went through other answers and I figured out that I need to flatten the data. Plus, It allows sorting, too. But I would love to explore json module for handling further data, too. – Pratik Mar 02 '18 at 03:51
0

You data is not actually a 6-level dictionary like a dictionary in a 3-level example you referenced to. The difference is: your dictionary has a data on multiple different levels, e.g. 'Lambo' value is on second level of hierarchy with key ('Things','Car') but 'Eclipse' value is on sixth level of hierarchy with key ('Things','Gadgets','Laptop','Programs','Coding','Java')

If you want to 'flatten' your structure you will need to decide what to do with 'missed' key values for deeper levels for values like 'Lambo'.

Btw, maybe it is not actually a solution for your problem, maybe you need to use more appropriate UI widgets like TreeView to work with such kind of hierarchical data, but I will try to directly address your exact question.

Unfortunately it seems to be no easy way to reference all different level values uniformly in one simple dict or list comprehension statement. Just look at your 'value extractor' (Dict[i][j][k][l][m][n]) there are no such values for i,j,k,l,m,n exists which allows you to get a 'Lambo'. Because to get a Lambo you will need to just use Dict['Things']['Car'] (ironically, in a real life it is also could be difficult to get a Lambo :-) )

One straightforward way to solve your task is: extract a second level data, extract a third level data, and so on, and combine them together. E.g. to extract second level values you can write something like this:

val_level2 = {(k1,k2):Dict[k1][k2] 
   for k1 in Dict 
   for k2 in Dict[k1] 
   if isinstance(Dict[k1],dict) and 
      not isinstance(Dict[k1][k2],dict)}

but if you want to combine it later with six level values, it will need to add some padding to your key tuples:

val_level2 = {(k1,k2,'','','',''):Dict[k1][k2] 
   for k1 in Dict 
   for k2 in Dict[k1] 
   if isinstance(Dict[k1],dict) and 
      not isinstance(Dict[k1][k2],dict)}

later you can combine all together by something like:

d = {}
d.update(val_level2)
d.update(val_level3)

But usually the most organic way to work with hierarchical data is to use some recursion, like this:

def flatten_dict(d,key_prefix,max_deep):
    return [(tuple(key_prefix+[k]+['']*(max_deep-len(key_prefix))),v) 
        for k,v in d.items() if not isinstance(v,dict)] +\
        sum([flatten_dict(v,key_prefix+[k],max_deep) 
              for k,v in d.items() if isinstance(v,dict)],[])

And later with code like this:

d={k:v for k,v in flatten_dict(Dict,[],5)}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
df.reset_index()

I actually get this result with your data:

resulting data_frame

P.S. According to https://www.python.org/dev/peps/pep-0008/#prescriptive-naming-conventions we prefer a lowercase_with_underscores for variable names, CapWords is for classes. So src_dict would be much better, than Dict in your case.

  • Your answer made me logically think again and again. flattening might be helpful for my data because I will need to sort it, too. Your answer seems to be a perfect fit for what I was looking for. Thank you. I will keep tinkering in Python ! – Pratik Mar 02 '18 at 03:10
  • Thank you for your feedback. It will motivate me to help others. Python is amazing, you have no chance to not fell in love if you continue to use it :-) – Recontemplator Mar 02 '18 at 09:57