0

I have a dict of the form:

    pd_dict = {'row_id_1': {'col_1': val1, 'col_2': val2},
               'row_id_2': {'col_1': val3, 'col_3': val4, 'col_4': val5}
               ...
              }

and I would like to turn this into a pandas DataFrame:

            col_1    col_2    col_3    col4    ...
row_id_1    val1     val2     NaN      NaN
row_id_2    val3     NaN      val4     val5
...

The number of columns per row differs. The same columns may or may not repeat on different rows. I'd like to merge all and fill in NaN values where appropriate.

I tried:

pd.DataFrame.from_dict(pd_dict, orient='index') 

...but that doesn't give the correct output.

I also tried creating one DataFrame per row and then concat-ing them like so:

frames = []
...
for k, cols in pd_dict.items():
    ...
    frames.append(pd.DataFrame.from_dict({k: list(cols.values())}, orient='index', columns=list(cols.keys())))
    ...
df = pd.concat(frames)

That works but it takes a very long time.

It's worth mentioning that my data has around 1000 rows and 1000 columns per row so performance might become an issue. Thanks in advance!

capitan
  • 309
  • 4
  • 13

2 Answers2

1

This is due to uneven len of dict .

pd.Series(pd_dict).apply(pd.Series)
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can do the following:

df = pd.DataFrame(pd_dict).T
print(df)
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5

Also your original attempt would work if you sorted:

print(pd.DataFrame.from_dict(pd_dict,orient='index').sort_index(1))
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5
pault
  • 41,343
  • 15
  • 107
  • 149
  • I tried sort_index as per your suggestion but didnt work. Using transpose however did work but doesnt keep the order of columns (they appear to be sorted alphabetically). Is there a way to keep the order? – capitan Jul 22 '19 at 17:33
  • @capitan what version of python? Pre python 3.6, dictionaries are unordered. [Post 3.6 they maintain insertion order](https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6/39980744). Dictionary key ordering aside, your question is a bit ambiguous as not all columns appear in all the dictionaries. How do you determine the correct order? – pault Jul 22 '19 at 17:36
  • Im using python 3.7. All rows have the same first 3 columns (first keys in the dict values of pd_dict), so i'd like those to appear first. After that I dont really care about the order. – capitan Jul 22 '19 at 17:39