2

I have a nested dictionary with 3 levels,

example_d = {'attribute_001': {'colour': {'blue': 5, 'green': 5, 'red': 5},
                   'country': {'France': 3, 'Germany': 3, 'India': 3, 'UK': 3, 'USA': 3}},
  'attribute_002': {'colour': {'blue': 5, 'green': 5, 'red': 5},
                   'country': {'France': 3, 'Germany': 3, 'India': 3, 'UK': 3, 'USA': 3}},
  'attribute_003': {'colour': {'blue': 5, 'green': 5, 'red': 5},
                   'country': {'France': 3, 'Germany': 3, 'India': 3, 'UK': 3, 'USA': 3}},
  'attribute_004': {'colour': {'blue': 5, 'green': 5, 'red': 5},
                   'country': {'France': 3, 'Germany': 3, 'India': 3, 'UK': 3, 'USA': 3}},
  'attribute_005': {'colour': {'blue': 5, 'green': 5, 'red': 5}, 
                   'country': {'France': 3, 'Germany': 3, 'India': 3, 'UK': 3, 'USA': 3}}}

and I want to move this into a pandas dataframe such that the row-index is sourced at the first level of my dictionary, and use the remaining levels as hierarchical column indices.

I can get close by using the following, adapting an answer from here:

pd.concat({key:pd.DataFrame.from_dict(example_d[key],orient='columns')
              for key in example_d.keys()}).unstack(1)

which gives me:

enter image description here

But I need the lowest level of the multi-level column indices to respect their parents.

i.e. under the colour heading, I want only colour-columns to appear, and under the country heading, I want to see only country-columns.

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42

2 Answers2

2

First change dictionary, pass to Series constructor and reshape by Series.unstack:

reform = {(level1_key, level2_key, level3_key): values
             for level1_key, level2_dict in example_d.items()
             for level2_key, level3_dict in level2_dict.items()
             for level3_key, values in level3_dict.items()}

df = pd.Series(reform).unstack(level=[1,2])
print (df)
              colour           country                     
                blue green red  France Germany India UK USA
attribute_001      5     5   5       3       3     3  3   3
attribute_002      5     5   5       3       3     3  3   3
attribute_003      5     5   5       3       3     3  3   3
attribute_004      5     5   5       3       3     3  3   3
attribute_005      5     5   5       3       3     3  3   3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Both answers work, but this one has the edge for me as it sets out a clear approach for dealing with not only this particular example, but with only some simple adjustment, could also be more generally applied to other nested dictionaries of different depth - by constructing tuple-keyed indices - which can then be managed using `stack` and `unstack`. – Thomas Kimber May 12 '19 at 18:39
1

IIUC using concat

df= pd.DataFrame(example_d).T
pd.concat([df[x].apply(pd.Series) for x in list(df)],1,keys=list(df))
Out[540]: 
              colour           country                     
                blue green red  France Germany India UK USA
attribute_001      5     5   5       3       3     3  3   3
attribute_002      5     5   5       3       3     3  3   3
attribute_003      5     5   5       3       3     3  3   3
attribute_004      5     5   5       3       3     3  3   3
attribute_005      5     5   5       3       3     3  3   3
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Both these answers are neat in their own way - this one most closely follows the stacked construction I had started with. – Thomas Kimber May 12 '19 at 18:35