0

I have two spreadsheet in '.xls' format, both have two sheets (sheet1 and sheet2). I am trying to overwrite the 'sheet2' in book2.xls with 'sheet2' from book1.xls. I am trying the pandas approach and here is my draft code,

import pandas as pd

# Open the first spreadsheet
df1 = pd.read_excel('Book1.xls', sheet_name='sheet2')

# Specify the writer file
writer = pd.ExcelWriter('Book2.xls')

# Write output
df1.to_excel(writer, 'sheet2', engine='xlswriter', index=False)

writer.save()

After merging, I expect only the 'sheet2' in book2.xls to be updated while retaining the 'sheet1'. However, thats not the case and book2.xls now only has 'sheet2' and deleted the 'sheet1'. I looked into several answers (e.g.,) and they work for '.xlsx' format using openpyxl. I am looking for a solution that deals with '.xls' format. Thanks for your help.

Also, I would have hoped, pandas.ExcelWrite to have one extra option for merging just a specified sheet, sigh!

****************************

Another approach I tried is here, but I get the error while merging the sheet.

import pandas as pd

# Open the first spreadsheet
df1 = pd.read_excel('Book1.xls')

excelBook = pd.ExcelFile(path+'Book2.xls')
writer = pd.ExcelWriter(path+'Book2.xls')
writer.book = excelBook

writer.sheets = dict((ws.title, ws) for ws in excelBook.sheet_names)

# Add new sheets
df1.to_excel(writer, "Sheet2", index=False)

# Save the file
writer.save()

`AttributeError: 'ExcelFile' object has no attribute 'add_sheet'`
pyPN
  • 105
  • 3
  • 9

2 Answers2

0
import pandas as pd

# Open the BOTH spreadsheets
book1sheet1 = pd.read_excel('Book1.xls', sheet_name='sheet2')
book2sheet1 = pd.read_excel('Book2.xls', sheet_name='sheet1')

# Specify the writer file
writer = pd.ExcelWriter('Book2.xls')

# Write output
book2sheet1.to_excel(writer, 'sheet1', engine='xlswriter', index=False)
book1sheet1.to_excel(writer, 'sheet2', engine='xlswriter', index=False)

writer.save()

I believe this will work. You are importing both at the beginning and rewriting the Book2 Sheet 1 when you run the to_excel command.


To keep the sheets without loading into memory, I would try something like this:

import pandas as pd
from openpyxl import load_workbook

path = "Book2.xls"

book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book

# this is the original sheet you wanted to add
df_added = pd.read_excel('Book1.xls', sheet_name='sheet2')


df_added.to_excel(writer, sheet_name = 'Book1Sheet1')

writer.save()
writer.close()
jpf5046
  • 729
  • 7
  • 32
  • Thanks! Yes, it works. Let's say, if I have over 100 sheets in book2.xls, then is there a simpler way to retain the rest of the sheets without having to define and write to output file? – pyPN Sep 23 '19 at 18:25
  • openpyxl will not work with `.xls` format. Here is the error, `openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.` – pyPN Sep 23 '19 at 19:01
  • try it the old way, replace that line with `writer = pd.ExcelWriter(path)` see if that works. – jpf5046 Sep 23 '19 at 19:02
  • I think the error is coming from `book = load_workbook(path)`. Thanks – pyPN Sep 23 '19 at 19:11
  • oh right. depending on the effort, maybe formatting your `.xls` files as `.xlsx` files is the way to go. – jpf5046 Sep 23 '19 at 19:23
  • Yes, that is one way to go and I was looking solution for `.xls` fomat.. :), Thanks! – pyPN Sep 23 '19 at 19:27
  • @MaxU..I tried your [approach](https://stackoverflow.com/questions/49519696/getting-attributeerror-workbook-object-has-no-attribute-add-worksheet-whil?rq=1). I get the error, `AttributeError: 'ExcelFile' object has no attribute 'add_sheet'`. Any thoughts? Thanks – pyPN Sep 23 '19 at 20:56
0

Here is one solution that works. Although the sheet gets merged, not the formatting and links associated with columns. If any of you have a better solution, please feel free to share. Thanks!

# Read excel file and get dict of sheetname(s), dataframe(s))

# File1
dict_1 = pd.read_excel('Book1.xls', sheet_name=None)

# File2
dict_2 = pd.read_excel('Book2.xls', sheet_name=None)

# Change the worksheet as dataframe
select_df1 = dict_1['Sheet2']

# Assign the selected dataframe to second worksheet
dict_2['Sheet2'] = select_df1

# Write all the sheets from dataframe2 that also has the updated sheet from File1
with pd.ExcelWriter('Book2.xls', 
                    engine='xlwt') as writer:
    # Write Sheets from dataframe2
    for ws_name, df_sheet in dict_2.items():
        df_sheet.to_excel(writer, sheet_name=ws_name, index=0)

writer.save()
writer.close()
pyPN
  • 105
  • 3
  • 9