1

When saving a dataframe to csv or excel, pandas will automatically add a first column as row index. I know there's a index=False argument to avoid this. However, if my dataframe have multiple column index, the error shows:

NotImplementedError: Writing to Excel with MultiIndex columns and no index ('index'=False) is not yet implemented.

Is there another way to skip this first column while keeping the multi-level column name for the header rows inside the excel file?

An example code to generate the dataframe:

import pandas as pd
import numpy as np

col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
                                ['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data.to_excel('test.xlsx')

And open the excel file you'll see: enter image description here

I would like to keep B1:G2 as my column name structure and drop the A:A (and also A3:G3). Thank you for any help~.

  • Why not just reset index and then write to file?. – MYousefi Jun 16 '22 at 07:58
  • You mean why not make the multi-index to just 1 level? Because I need to keep the format of multiple column index to take less effort on manual operation after export to excel. – wen tse yang Jun 16 '22 at 08:15
  • I didn't try it but it should probably work like @MYousefi mentioned, if you don't set `inplace=True` in the reset index you can still keep your multi-index df. Something like this: `df.reset_index().to_excel(index=False)` – Yehla Jun 16 '22 at 09:41
  • @Yehla I just tried this and still see the error. The reason I need this df to be 3-level multi-index is that I want the excel file also see the first 3 rows as my 3-level column names. – wen tse yang Jun 16 '22 at 10:28
  • @wen tse yang Sorry I'm a bit confused. Do you want to see the index in the excel or not? I just tried `df.reset_index(drop=True).to_excel("test.xlsx", index=False)` and for me this worked. If you want to see the index as the first columns remove `drop=True` . The initial df will keep its multi-index. – Yehla Jun 16 '22 at 11:52
  • @Yehla Thank you for the prompt reply. My df looks like this: https://stackoverflow.com/questions/18470323/selecting-columns-from-pandas-multiindex. I want to keep the column name structure and I don't want to see the row index (drop the first column). – wen tse yang Jun 16 '22 at 13:34
  • I've never tried saving a multi-level column names. I have however saved such a df after using to_flat_index on the multi level. It essentially replaces the multi level with `level1.level2` columns instead. Maybe you can save it that way. – MYousefi Jun 16 '22 at 14:42
  • @MYousefi Thank you for the suggestion. I hope there's some way to keep multi-column name structure in the excel. – wen tse yang Jun 17 '22 at 02:11

2 Answers2

1

I think currently this is not possible with pandas. You could however solve it with openpyxl. Something like this might do the trick:

from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows

# opening an excel workbook and worksheet
wb = Workbook()
ws = wb.active

# writing dataframe to excel
for r in dataframe_to_rows(data, index=False, header=True):
    ws.append(r)

# merging header cells
for merge in range(int(data.shape[1]/3)):
    ws.merge_cells(start_row=1, end_row=1, start_column=merge*3+1, end_column=merge*3+3)

# saving to excel
wb.save("test.xlsx")

There is for sure a nicer way to solve the merging of the header cells. But this should suffice to give you some idea. The output file looks like that:

enter image description here

With openpyxl you can adjust the formatting as well, if this matters to you.

Yehla
  • 199
  • 11
0

This might solve your problem:

data.T.reset_index(level=1, drop=True).T.to_excel("test.xlsx", index=False)

The first part data.T.reset_index(level=1, drop=True) transposes the dataframe and gets rid of the level 1 index (which in your case is the a,b,c-index). .T.to_excel("test.xlsx", index=False) then transposes the dataframe to its original form and drops the row index while writing it to the excel.

The output looks like this:

enter image description here

Yehla
  • 199
  • 11
  • Thank you but what I need is to keep the column name format as shown in my figure. Is it possible to keep the first row merged format ("one" takes 3 columns and "two" takes 3 columns) and the second row of ('a', 'b', 'c', 'a', 'b', 'c')? – wen tse yang Jun 21 '22 at 06:02
  • Aaah I really misunderstood you. I think there is no way to do this with pandas yet. I added another answer to do it with openpyxl. Hope this helps. – Yehla Jun 22 '22 at 12:48