Original question: I am using python 3. I have some 4 level dictionary and 5 level dictionary. I want to convert this multilevel dictionary into a pandas DataFrame with a recursive function
To simplified my question and test my function, I generated a 3 level dictionary as shown below and to try my recursive function. I understand that with this 3 levels nested dictionary, there are many other ways to solve the problem. But, I feel only recursive function can be easily applied to solve the problem on 4 levels, 5 levels or more levels dictionary
To create a simplified 3-level dictionary:
from collections import defaultdict
def ddict():
return defaultdict(ddict)
tree = ddict()
tree['level1_1']['level2_1']['level3_1'] = <pd.Series1>
tree['level1_1']['level2_1']['level3_2'] = <pd.Series2>
tree['level1_1']['level2_2']['level3_1'] = <pd.Series3>
tree['level1_1']['level2_2']['level3_2'] = <pd.Series4>
tree['level1_2']['level2_1']['level3_1'] = <pd.Series5>
tree['level1_2']['level2_1']['level3_2'] = <pd.Series6>
tree['level1_2']['level2_2']['level3_1'] = <pd.Series7>
tree['level1_2']['level2_2']['level3_2'] = <pd.Series8>
Inspired by Bart Cubrich below, I revised xx's code and put my solution here
import collections
def tree2df (d, colname):
"""
Inputs:
1. d (a nested dict, or a tree, all values are pd.Series)
2. colname (a list)
Return:
1. a pd.DataFrame
"""
def flatten(d, parent_key='', sep='-'):
items = []
for k, v in d.items():
new_key = str(parent_key) + str(sep) + str(k) if parent_key else k
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
else:
items.append((new_key, v))
return dict(items)
flat_dict = flatten (d)
levels, vals = zip(*[(tuple(level.split('-')),val) for level, val in flat_dict.items()])
max_level = np.max(np.array([len(l) for l in levels]))
if len(colname) != max_level:
print ("The numbers of column name is invalid because of moer than maximum level: %s.\nNothing will be returned. Please revise the colname!"%max_level)
else:
colname += ['Old index']
s = pd.concat(list(vals), keys = list(levels), names = colname)
s = pd.DataFrame(s)
s.reset_index(inplace=True)
s.rename(columns={0:'Value'},inplace=True)
return s
#Example
BlockEvent_TS_df = tree2df (BlockEvent_TS_tree, ['ID','Session','Trial type','Block', 'Event name'])
The 5-level nested dictionary is in the same idea as 3-level one:
tree['level1_1']['level2_1']['level3_1']['level4_1']['level5_1'] = <pd.Series1>
...
tree['level1_2']['level2_2']['level3_2']['level4_2']['level5_2'] = <pd.Series32>
Because I have a large dataset, so it's very complicated to show the whole nested dictionary here. But, the idea is like this. And later on, I want to have 6 col, 5 col to store each level and one column is for value.
I've tried the code above and it works well for me. The speed is also very decent.
Thanks for all your help!