0

I am currently struggling with extracting/flatten data from hugely nested dictionary: Flattening a nested dictionary with unique keys for each dictionary? .

I received a somewhat acceptable response, but do now have problems in relation to applying that methodology to another dictionary. So far I have gotten to a point where I have the following DataFrame.

First I would concatenate the values of "this_should_be_columns" + '_' + "child_column_name", (not a problem)

What I want is for all the unique values in ("this_should_be_columns"_"child_column_name") to become headers, and the rows should be there corresponding value (column "0"). Any ideas/solutions would be much appreciated!

FYI, my dictionary looks as follows:

{'7454': 
  {'coach': 
    {'wyId': 562711, 'shortName': 'Name1', 'firstName': 'N1', 'middleName': '', 'lastName': 'N2', 
     'birthDate': None, 
     'birthArea': 
       {'id': 208, 'alpha2code': 'DK', 'alpha3code': 'DNK', 'name': 'Denmark'}, 
     'passportArea': 
       {'id': 208, 'alpha2code': 'DK', 'alpha3code': 'DNK', 'name': 'Denmark'}, 
     'currentTeamId': 
        7454, 'gender': 'male', 'status': 'active'}}, 
 '7453': 
  {'coach': 
    {'wyId': 56245, 'shortName': 'Name2', 'firstName': 'N3', 'middleName': '', 'lastName': 'N4', 
     'birthDate': 'yyyy-mm-dd', 
     'birthArea': 
       {'id': 208, 'alpha2code': 'DK', 'alpha3code': 'DNK', 'name': 'Denmark'}, 
     'passportArea': 
       {'id': 208, 'alpha2code': 'DK', 'alpha3code': 'DNK', 'name': 'Denmark'}, 
     'currentTeamId': 
         7453, 'gender': 'male', 'status': 'active'}}}

The code looks as follows:

df_test = pd.DataFrame(pd.Series(responses).apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).stack().apply(pd.Series).reset_index().rename(columns={'level_0': 'teamId', 'level_1': 'type', 'level_2': 'this_should_be_columns', 'level_3': 'child_column_name', 'level_4': 'firstname', 'level_5' :'middleName', 'level_6' : 'ignore'}))
del df_test['firstname']
del df_test['middleName']
del df_test['ignore']
print(df_test)
augen123
  • 1
  • 1

1 Answers1

0

The problem is that your dictionaries have a different number of levels. 'birthArea' and 'passportArea' contain dictionaries while the other keys simply contain values. You can use pd.json_normalize() to flatten the keys of the innermost dictionary as described in Flatten nested dictionaries, compressing keys.

In [37]: pd.DataFrame(responses).stack().apply(lambda x: pd.json_normalize(x, sep='_').to_dict(orient='records')[0]).apply(pd.Series).stack().reset_index()                                                
Out[37]: 
   level_0 level_1                  level_2        0
0    coach    7454                     wyId   562711
1    coach    7454                shortName    Name1
2    coach    7454                firstName       N1
3    coach    7454               middleName         
4    coach    7454                 lastName       N2
..     ...     ...                      ...      ...
28   coach    7453           birthArea_name  Denmark
29   coach    7453          passportArea_id      208
30   coach    7453  passportArea_alpha2code       DK
31   coach    7453  passportArea_alpha3code      DNK
32   coach    7453        passportArea_name  Denmark
maow
  • 2,712
  • 1
  • 11
  • 25