Create a pandas Dataframe from a nested dict with row indices as dict keys and a dict with different columns per key

Question

I have a dict of the form:

    pd_dict = {'row_id_1': {'col_1': val1, 'col_2': val2},
               'row_id_2': {'col_1': val3, 'col_3': val4, 'col_4': val5}
               ...
              }

and I would like to turn this into a pandas DataFrame:

            col_1    col_2    col_3    col4    ...
row_id_1    val1     val2     NaN      NaN
row_id_2    val3     NaN      val4     val5
...

The number of columns per row differs. The same columns may or may not repeat on different rows. I'd like to merge all and fill in NaN values where appropriate.

I tried:

pd.DataFrame.from_dict(pd_dict, orient='index')

...but that doesn't give the correct output.

I also tried creating one DataFrame per row and then concat-ing them like so:

frames = []
...
for k, cols in pd_dict.items():
    ...
    frames.append(pd.DataFrame.from_dict({k: list(cols.values())}, orient='index', columns=list(cols.keys())))
    ...
df = pd.concat(frames)

That works but it takes a very long time.

It's worth mentioning that my data has around 1000 rows and 1000 columns per row so performance might become an issue. Thanks in advance!

score 1 · Accepted Answer · answered Jul 22 '19 at 15:45

1

This is due to uneven len of dict .

pd.Series(pd_dict).apply(pd.Series)

answered Jul 22 '19 at 15:45

BENY

317,841
20
164
234

pault · Answer 2 · 2019-07-22T15:52:55.617

1

You can do the following:

df = pd.DataFrame(pd_dict).T
print(df)
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5

Also your original attempt would work if you sorted:

print(pd.DataFrame.from_dict(pd_dict,orient='index').sort_index(1))
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5

edited Jul 22 '19 at 15:52

answered Jul 22 '19 at 15:46

pault

41,343
15
107
149

I tried sort_index as per your suggestion but didnt work. Using transpose however did work but doesnt keep the order of columns (they appear to be sorted alphabetically). Is there a way to keep the order? – capitan Jul 22 '19 at 17:33
@capitan what version of python? Pre python 3.6, dictionaries are unordered. [Post 3.6 they maintain insertion order](https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6/39980744). Dictionary key ordering aside, your question is a bit ambiguous as not all columns appear in all the dictionaries. How do you determine the correct order? – pault Jul 22 '19 at 17:36
Im using python 3.7. All rows have the same first 3 columns (first keys in the dict values of pd_dict), so i'd like those to appear first. After that I dont really care about the order. – capitan Jul 22 '19 at 17:39

Create a pandas Dataframe from a nested dict with row indices as dict keys and a dict with different columns per key

2 Answers2