When I save a multiindex and multicolumn dataframe to excel. I have two options. With and without merge_cells=False
. This SO Question explains that quite nicely:
Avoid merged cells in pandas to_excel method
If I do use merge_cells=False
, I would have to use ffil
to fill up the merged cells with the previous cell which might be a bit dangerous because if there are truly Nans (not emerging from merged cell), they might be (falsely) overridden.
I could generat a tidy format by using df.to_excel(writer, merge_cells=False)
A multiindex dataframe like this one (https://pastebin.com/Me0QhwUy):
AA1 AA2 CB1 BB2 CC1
a a b b c d
ng/mL N/A N/A mL N/A EU/mL
0 1
A 1 1 1 1 1 1 1
2 1 1 1 1 1 1
B 1 1 1 1 1 1 1
2 1 1 1 1 1 1
C 1 1 1 1 1 1 1
2 1 1 1 1 1 1
Would become (after reading the saved excel):
writer = pd.ExcelWriter('test.xlsx')
df.to_excel(writer, merge_cells=False)
writer.save()
writer.close()
df2 = pd.read_excel('test.xlsx')
0 1 AA1.a.ng/mL AA2.a.N/A CB1.b.N/A BB2.b.mL CC1.c.N/A CC1.d.EU/mL
0 A 1 1 1 1 1 1 1
1 A 2 1 1 1 1 1 1
2 B 1 1 1 1 1 1 1
3 B 2 1 1 1 1 1 1
4 C 1 1 1 1 1 1 1
5 C 2 1 1 1 1 1 1
I am aware of the function wide to long but this function does assume common stubnames which are not present here.
How can I restore the original multiindex dataframe from the saved excel (with merged_cell=False
) ?