Flatten list of dicts

Question

I have a pandas dataframe (sample) as follows

df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
          'Habitat':[[{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}],
                     [{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}],
                     [{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}]],
            'num':[1,2,3]
          })

df

    Country                                                           Habitat   num
0   India   [{'city1':'Ind1','city2':'Ind2'},{'town1':'IndT1','town2':'IndT2'}] 1
1   China   [{'city1':'Chi1','city2':'Chi2'},{'town1':'ChiT1','town2':'ChiT2'}] 2
2   Nepal   [{'city1':'Nep1','city2':'Nep2'},{'town1':'NepT1','town2':'NepT2'}] 3

I need to flatten this out in this format.

result_df = pd.DataFrame({'Country':['India', 'China', 'Nepal'],
          'Habitat.city1':['Ind1','Chi1','Nep1'],
            'Habitat.city2':['Ind2','Chi2','Nep2'],
            'Habitat.town1':['IndT1','ChiT1','NepT1'],
            'Habitat.town2':['IndT2','ChiT2','NepT2'],
            'num':[1,2,3]
          })

result_df

    Country Habitat.city1   Habitat.city2   Habitat.town1   Habitat.town2   num
    India       Ind1            Ind2            IndT1           IndT2       1
    China       Chi1            Chi2            ChiT1           ChiT2       2
    Nepal       Nep1            Nep2            NepT1           NepT2       3

I have tried pd.json_normalize(df.explode('Habitat')['Habitat]) but it creates new rows which I do not need.

My observation: Some form of groupby and transpose that can properly build on pd.json_normalize(df.explode('Habitat')['Habitat]) could solve my problem but so far I have not had any luck

Is this coming for a source JSON, can we flatten before the creation of this dataframe? — Scott Boston, Jul 18 '22 at 15:44
The source is a json file. But the files can get huge, so iterating per row may get very expensive — Charizard_knows_to_code, Jul 18 '22 at 15:46
working with the json file, before dumping into Pandas should be more performant — sammywemmy, Jul 18 '22 at 23:15

score 5 · Answer 1 · answered Jul 18 '22 at 15:51

Let us use ChainMap to merge the list of dictionaries in each row, then create a new dataframe and join back with original dataframe

from itertools import starmap
from collections import ChainMap

h = pd.DataFrame(starmap(ChainMap, df['Habitat']), df.index)
df.join(h.add_prefix('Habitat.'))

  Country                                                                     Habitat  num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0   India  [{'city1': 'Ind1', 'city2': 'Ind2'}, {'town1': 'IndT1', 'town2': 'IndT2'}]    1          Ind1          Ind2         IndT1         IndT2
1   China  [{'city1': 'Chi1', 'city2': 'Chi2'}, {'town1': 'ChiT1', 'town2': 'ChiT2'}]    2          Chi1          Chi2         ChiT1         ChiT2
2   Nepal  [{'city1': 'Nep1', 'city2': 'Nep2'}, {'town1': 'NepT1', 'town2': 'NepT2'}]    3          Nep1          Nep2         NepT1         NepT2

Dani Mesejo · Answer 2 · 2022-07-18T16:06:40.963

In Python 3.9+ you can use the dictionary union, as follows:

import pandas as pd
from operator import or_
from itertools import starmap
    
    
flat = pd.DataFrame(starmap(or_, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)
print(res)

Output

Country  num Habitat.city1 Habitat.city2 Habitat.town1 Habitat.town2
0   India    1          Ind1          Ind2         IndT1         IndT2
1   China    2          Chi1          Chi2         ChiT1         ChiT2
2   Nepal    3          Nep1          Nep2         NepT1         NepT2

The operator or_ will call the | implementation of the objects under-the-hood, from the documentation:

Return the bitwise or of a and b.

For the case of dictionaries the | is the union. For more information on how to merge two dictionaries see this answer.

An alternative solution that works for any number of dictionaries, is to use functools.reduce:

import pandas as pd
from operator import or_
from functools import reduce, partial

merge = partial(reduce, or_)

flat = pd.DataFrame(map(merge, df['Habitat']), df.index).add_prefix('Habitat.')
res = pd.concat([df.drop(labels=['Habitat'], axis=1), flat], axis=1)

For more information see partial and reduce.

Flatten list of dicts

2 Answers2