7

Per https://github.com/pandas-dev/pandas/pull/21251/files/09e5b456e1af5cde55f18f903ab90c761643b05a, we should be able to append DataFrames to new XLSX sheets.

Based on the documentation, I tried the following:

>>> import pandas as pd
>>>                
... d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
...                "B":['5', '10', '20']})
>>> d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
...                "B":['1', '2', '3']})
>>> 
>>> # Create XLSX document for ticker
... writer = pd.ExcelWriter('test.xlsx',engine='openpyxl')
>>> d1.to_excel(writer,sheet_name='d1')
>>> writer.save()
>>> 
>>> writer = pd.ExcelWriter('test.xlsx',engine='openpyxl', mode='a')
>>> d2.to_excel(writer,sheet_name='d2')
>>> writer.save()
>>> 
>>> pd.__version__
'0.23.4'     # Just updated this per a comment
>>> 
>>> 

The result is a single workbook named 'test.xlsx' with a single tab 'd2'.

How can I prevent the workbook/sheet form being overwritten?

  • Possible duplicate of [How to write to an existing excel file without overwriting data (using pandas)?](https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas) – Poojan Jan 14 '19 at 17:39
  • 1
    Sorry, a bit too quick to mark as duplicate. This was introduced in `pandas` version `0.23.1` so you need to upgrade. – user3471881 Jan 14 '19 at 17:41
  • Thanks, I upgraded per your suggestion. I was using "pip3 install pandas" instead of adding "--upgrade" so I thought I had the latest version. Per your suggestion, I updated but am still getting the same result. – enter_display_name_here Jan 14 '19 at 17:48
  • Also, the other referenced is for the function 'df.to_excel()', not 'pd.ExcelWriter'. – enter_display_name_here Jan 14 '19 at 17:51
  • Are you running this on a REPL? Make sure you restart your shell since the `pandas` module is loaded in memory with the older version. – r.ook Jan 14 '19 at 17:51
  • Based on my search, I am guessing that "REPL" is typing directly into the terminal. So since I am on a Mac, I am logged into the Terminal and entered commands one by one. I did two things per your recommendation (1) Closed all Terminal windows, then restarted Terminal; (2) Ran the script via the Terminal, i.e. "python3 test.py". The results from both cases are the same, unfortunately. – enter_display_name_here Jan 14 '19 at 17:59

4 Answers4

10

You can use with:

with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
    d1.to_excel(writer,sheet_name='d1')
    d2.to_excel(writer,sheet_name='d2')
    writer.save()

writer.close()

update

This should work just note that the a blank file needs to be created before hand. You can just create a blank file using python if you want. I created a simple loop to, in some ways, mimic the essence of what you are trying to accomplish:

import pandas as pd
from openpyxl import load_workbook

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

dfs = [d1,d2]

for i in range(len(dfs)):
    sheet = 'd'+str(i+1)
    data = dfs[i]
    writer = pd.ExcelWriter('atest.xlsx',engine='openpyxl', mode='a')
    writer.book = load_workbook('atest.xlsx') # here is the difference
    data.to_excel(writer,sheet_name=sheet)
    writer.save()
    writer.close()

or here is the modified first example:

d1 = pd.DataFrame({"A":['Bob','Joe', 'Mark'], 
               "B":['5', '10', '20']})
d2 = pd.DataFrame({"A":['Jeffrey','Ann', 'Sue'], 
                "B":['1', '2', '3']})

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='w')
d1.to_excel(writer,sheet_name='d1')
writer.save()
writer.close()

writer = pd.ExcelWriter('atest.xlsx', engine='openpyxl', mode='a')
writer.book = load_workbook('atest.xlsx')
d2.to_excel(writer,sheet_name='d2')
writer.save()
writer.close()
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • I tried this but the result is comparable to https://xlsxwriter.readthedocs.io/example_pandas_multiple.html. It's a different engine but the workbook is opened only once then both tabs are written. I need to open/write/close then open/write/close. I tried your suggestion by putting `d1.to_excel()` and `d2.to_excel()` in separate `with` loops but the result was a single XLSX with a single tab. I also tried your suggestion but with setting `mode='w'`. It worked the same as `mode='a'`. – enter_display_name_here Jan 14 '19 at 18:14
  • @enter_display_name_here why do you need to open/write/close multiple times? – It_is_Chris Jan 14 '19 at 18:17
  • I am building an extremely large dictionary of DataFrames and would like to write out each DataFrame as a tab as they get completed. I would hate to get 100+ DataFrames in and then the program experiences an issue and nothing is written out. Basically, I am scraping 4 websites, compiling different tables of data into a single DF, then doing that process over and over again. I'd like to generate an individual tab per DF as it goes through the program. – enter_display_name_here Jan 14 '19 at 18:21
  • Thanks for taking the time to put that together. Your update is actually the workaround to what was done prior to the addition of `mode='a'`. If you refer to line 1009 (here: https://github.com/pandas-dev/pandas/pull/21251/files/09e5b456e1af5cde55f18f903ab90c761643b05a#diff-0153a977179cfc73a2645cf0b7cd1f7fR1009), you'll notice that they added `book = load_workbook(self.path)` to their code. Your code works the same as with `mode='w'`. If I am unable to get `mode='a'` to work, then I will use this. I am hoping that their new code will check for an existing tab and then overwrite that. – enter_display_name_here Jan 14 '19 at 19:14
1

import pandas as pd

writer = pd.ExcelWriter(wk_path + save_file)
# ....
# build sc_files DataFrame and save. sc_files includes
# a column called OS.

sc_file.to_excel(writer, sheet_name='test')

# build data frame of OS counts out of sc_file
counts_os = sc_file.OS.value_counts() 

# To append to 'test' sheet, use startcol=x1, startrow=y
# To append counts_OS to the end of the current 'test' sheet
y = len(sc_file)
y += 1
counts_os.to_excel(writer, sheet_name='test', 
    startcol=1, startrow=y)

# write counts_os to sheet test2 
counts_os.to_excel(writer, sheet_name='test2')
writer.save()
writer.close()
  • Just re-read and see that I answered how to append to the same tab but the question is about multiple tabs. I have a program similar to yours that writes multiple tabs successfully. I don't specify engine='openpyxl' and I don't do a writer.save() until the very end of the program - last step. Works though I could see limits if my data was too large. – Adam Safier Apr 11 '19 at 00:28
1

This worked for me, it creates a file if the file does not exists, and append to the end of the file if it already exists

you may need to install openpyxl

import pandas as pd
from openpyxl import load_workbook

def append_xlsx(df, file = 'results.xlsx'):
    ext = '.xlsx'
    if ext not in file:
        file+=ext
    
    if os.path.exists(file):
        mode="a"
        if_sheet_exists="overlay"
        header = False

        wb = load_workbook(file)
        sheet = wb.worksheets[0]
        startrow = sheet.max_row
    else:
        mode='w'
        if_sheet_exists = None
        header = True
        startrow = 0

    with pd.ExcelWriter(
        file,
        mode=mode,
        engine="openpyxl",
        if_sheet_exists=if_sheet_exists,
    ) as writer:

        df.to_excel(
            writer, 
            sheet_name="Sheet1",
            startrow=startrow,
            header=header,
            index=False,
            encoding='utf8'
        )
0

I submitted a post on GitHub and received a response from the contributors (see the highlighted portion below). It turns out that this functionality won't be released until 0.24 so it is not available in 0.23.1. FYI - I downloaded the RC and successfully tried out the mode='a'option. However, there may be a bug with workbooks that do not exist; I receive FileNotFoundError: [Errno 2] No such file or directory: 'test.xlsx'.

"this feature is being released as part of 0.24 which we just issued a release candidate for over the past few days. You can try on the RC or here on master and if neither works open an issue per the contributing guide, but this wouldn't be expected to work on versions older than that"

  • did you create the blank `test.xlsx` file first – It_is_Chris Jan 14 '19 at 21:10
  • Yep, I tried that and it worked. I also just double-checked their code. At first, I thought a new workbook would be created if one doesn't exist, but I think that's incorrect. I think the code creates a new workbook if `mode=` is not `'a'`, not if a workbook does not exist. – enter_display_name_here Jan 14 '19 at 21:27
  • 2
    After looking at the code that is correct, a file is not created when `mode=='a'` they are just using `load_workbook`, which means the file needs to exist prior to running `ExcelWriter` with the param `mode='a'` if `mode != a` then they are creating a new workbook: `self.book = Workbook()` – It_is_Chris Jan 14 '19 at 22:07