8

I have code from a while ago that I am re-using for a new task. The task is to write a new DataFrame into a new sheet, into an existing excel file. But there is one part of the code that I do not understand, but it just makes the code "work".

working:

from openpyxl import load_workbook
import pandas as pd
file = r'YOUR_PATH_TO_EXCEL_HERE'

df1 = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book # <---------------------------- piece i do not understand
df1.to_excel(writer, sheet_name='New', index=None)
writer.save()

The little line of writer.book=book has me stumped. Without that piece of code, the Excel file will delete all other sheets, except the sheet used in the sheetname= parameter in df1.to_excel.

i looked at xlsxwriter's documentation as well as openpyxl's, but cannot seem to figure out why that line gives me my expected output. Any ideas?

edit: i believe this post is where i got the original idea from.

MattR
  • 4,887
  • 9
  • 40
  • 67
  • 1
    I think panda will just use the writer to write the data to a new book thus ovewriting the existing book if you don't specify that the book already has data. – T4rk1n Jul 20 '17 at 16:27
  • I did not know there was a way to prevent pandas from overwriting the whole workbook. This is very good to know! – saintsfan342000 Jul 20 '17 at 16:39
  • @saintsfan342000 glad I could help! I thought this was neat as well. Now it's just getting to the bottom of why this works and how I can expand on it after i figure this out! – MattR Jul 20 '17 at 16:41
  • In terms of understanding what is going on, it is worth noting that there is no such thing as "specifying that the book already has data". Even when you are using the genuine Excel program from Microsoft, whenever you save, it is **always** writing a completely, utterly brand-new file. It *so happens* that the brand-new file can have the exact same name as the old file, and in that case, it *seems as though* the file has been "updated", but really the old file was overwritten by the new. – John Y Jun 07 '19 at 21:33

1 Answers1

4

In the source code of ExcelWriter, with openpyxl, it initializes empty workbook and delete all sheets. That's why you need to add it explicitly

class _OpenpyxlWriter(ExcelWriter):
    engine = 'openpyxl'
    supported_extensions = ('.xlsx', '.xlsm')

    def __init__(self, path, engine=None, **engine_kwargs):
        # Use the openpyxl module as the Excel writer.
        from openpyxl.workbook import Workbook

        super(_OpenpyxlWriter, self).__init__(path, **engine_kwargs)

        # Create workbook object with default optimized_write=True.
        self.book = Workbook()

        # Openpyxl 1.6.1 adds a dummy sheet. We remove it.
        if self.book.worksheets:
            try:
                self.book.remove(self.book.worksheets[0])
            except AttributeError:

                # compat
                self.book.remove_sheet(self.book.worksheets[0])
Aritesh
  • 1,985
  • 1
  • 13
  • 17