I have a dict of the form:
pd_dict = {'row_id_1': {'col_1': val1, 'col_2': val2},
'row_id_2': {'col_1': val3, 'col_3': val4, 'col_4': val5}
...
}
and I would like to turn this into a pandas DataFrame:
col_1 col_2 col_3 col4 ...
row_id_1 val1 val2 NaN NaN
row_id_2 val3 NaN val4 val5
...
The number of columns per row differs. The same columns may or may not repeat on different rows. I'd like to merge all and fill in NaN values where appropriate.
I tried:
pd.DataFrame.from_dict(pd_dict, orient='index')
...but that doesn't give the correct output.
I also tried creating one DataFrame per row and then concat-ing them like so:
frames = []
...
for k, cols in pd_dict.items():
...
frames.append(pd.DataFrame.from_dict({k: list(cols.values())}, orient='index', columns=list(cols.keys())))
...
df = pd.concat(frames)
That works but it takes a very long time.
It's worth mentioning that my data has around 1000 rows and 1000 columns per row so performance might become an issue. Thanks in advance!