0

I have a nested list of dictionaries and lists as below:

my_list = [{'a1':'1','b1':[{'c11':'2','d11':'3','e11':'4','f11':'5'},{'c12':'6','d12':'7','e12':'8','f12':'9'}],
'g1':'10','h1':'11'},

{'a2':'12','b2':[{'c21':'13','d21':'14','e21':'15','f21':'16'},{'c22':'17','d22':'18','e22':'19','f22':'20'}],
'g2':'21','h2':'22'},
.....,
.....
]

and I want to flatten the same as a dataframe like:


df = a1 c11 d11 e11 f11 c12 d12 e12 f12 g1 h1
     1   2  3   4   5   6   7   8   9   10 11
     12  13 14  15  16  17  18  19  20  21 22

how to flatten the list to DataFrame?

I tried looking at the sample code here Nested List to Pandas DataFrame with headers,

I've tried with json_normlize without success

Srinivas
  • 241
  • 2
  • 8

1 Answers1

2

You could iterate over the list of dictionaries first to flatten each row.

import re

def remove_first_digit(s):
    return re.sub(r'[0-9]', '', s, 1)

def flatten_row(d, flattened):
    """Recursively iterate through the dict d, saving any (str: str)
       key-value pairs we find in the flattened dict."""
    for k, v in d.items():
        if type(v) is dict:
            flatten_row(v, flattened)
        elif type(v) is list:
            for _v in v:
                flatten_row(_v, flattened)
        else:
            # Remove row index so col heading will match
            col_heading = remove_first_digit(k)  
            flattened[col_heading] = v
    return flattened

rows = []
for row in my_list:
    row = flatten_row(row, {})
    rows.append(row)

print(pd.DataFrame(rows))
    a  c1  d1  e1  f1  c2  d2  e2  f2   g   h
0   1   2   3   4   5   6   7   8   9  10  11
1  12  13  14  15  16  17  18  19  20  21  22

You would have to rename the column headings to give an exact match to your described output.

df = df.rename(columns={'a':'a1', 'c1': 'c11', 'd1': 'd11', 'e1': 'e11',
                        'f1': 'f11', 'c2': 'c12', 'd2': 'd12', 'e2': 'e12',
                        'f2': 'f12', 'g': 'g1', 'h': 'h1'})
Tom
  • 51
  • 1
  • 5