1

I need to add a row above the header columns in a dataframe, which will be converted to an excel file, with the limitation that there cannot be any reading / writing of files locally. Because of this, I am unable to use open('filename.xls', 'w') as f:

This is because the script is to be run in a place where files cannot be read/written locally.

So for example, I want something like this

text here
*animal* *no_of_legs* *name*
cat       4            meow
bird      2            chirp
rabbit    2            bun

I have an array allAnimals consisting of all the animals data. I tried allAnimals.insert(0,['text here]) then df = pd.DataFrame(allAnimals, columns=['animals', 'no_of_legs', 'name']) to convert as a dataframe. I then use df.to_excel(xxx, index=False) but I get something like this instead:

*animal* *no_of_legs* *name*
text here
cat       4            meow
bird      2            chirp
rabbit    2            bun

Alternatively, another method I tried involved creating a new dataframe storing only 'text here', and then tried to use concat but it doesn't add the data horizontally. It adds a new column instead. The same goes for append. So this would be what I get:

            *animal* *no_of_legs* *name*
text here
             cat       4            meow
             bird      2            chirp
             rabbit    2            bun

I have read some other questions similar to this but they are not very applicable to my case, as such, I am unable to solve this. Any tips would be appreciated!

Is it possible to add a row above the headers in pandas

Add rows *on top of* column names Pandas Dataframe as header info?

How to add an empty row on top of the header row in a dataframe?

Add rows back to the top of a dataframe

MatCode
  • 114
  • 4
  • 14
  • 1
    ok, i am trying it out. if it doesn't work, i shall ask for my question to be reopened. – MatCode Aug 11 '22 at 07:28
  • 1
    The question did mention that there cannot be any reading / writing of files locally, so the proposed solution of writing pandas dataframe to excel seems to be rather inapplicable. This question seems to be completely different in that it is trying to ask for solutions on working with DataFrames on pandas, rather than writing pandas dataframe to excel. – piplup Aug 11 '22 at 07:40
  • @jezrael im sorry, but i tried the method in the proposed solution, it was not helpful because for the portion of "pd.ExcelWriter(file-name, engine="xlsxwriter")", i am unable to read the file-name. I tried replacing it with my version and I faced an error. – MatCode Aug 11 '22 at 07:43

2 Answers2

3

You can edit your dataframe's column with an additional column index level by making it a MultiIndex. Assign df.columns to be pd.MultiIndex:

Attempting to reproduce the example:

import pandas as pd


allAnimals = [['cat', 4, 'meow'], ['bird', '2', 'chirp'], ['rabbit', 2 ,'bun']]


df = pd.DataFrame(allAnimals, columns= ['animals', 'no_of_legs', 'name'])

df.columns = pd.MultiIndex.from_tuples(
    zip(['text here', '', ''], 
        df.columns))
df

Output:

output

Remove the index column accordingly if needed.

This may be of help to you: working with pandas additional column index

piplup
  • 119
  • 4
0

The above answer was a big step forward in answering the question, however, it involved multi-index which then disallowed me from using index=False in df.to_excel(xxx, index=False). This led to the excel file being off-format, since it is now index=True.

This means I got an excel file like this: excel file with formatting issues

Honestly, the only complication here is the limitation of not being able to read/write locally.

I went on to research on how to fix this, and came across some articles:

How to hide the rows index

Get rid of index while outputting multi header pandas dataframe to excel

I couldn't implement the first website's answer because I couldn't supply a local path. I decided to use the second website's first workaround, which involved using openpyxl and using bytesIO.

In case anyone needs this in the future, here's the implementation:

from openpyxl import load_workbook, Workbook

in_memory_fp = io.BytesIO()
df.to_excel(in_memory_fp)
in_memory_fp.seek(0,0)

workbook = load_workbook(in_memory_fp)
worksheet = workbook['Sheet1']

worksheet.delete_cols(1)
worksheet.delete_rows(3)

for i in worksheet.values:
    print(i)

workbook.save(in_memory_fp)

and then now, the excel will be nicely formatted like this:excel file with nicer format

MatCode
  • 114
  • 4
  • 14