0

I have an excel workbook that has 100+ tabs. I would like to create multiple files based on the sheet names from the keep_sheets lists below.

The file is very large so it's taking a long time to load the workbook. In the code below, it's savings the workbook after every For loop and therefore I'm having to re-load the workbook after each loop. Is there a better way to do this? Ideally, I would like to only load the workbook 1 time if possible.

wb = openpyxl.load_workbook('Book1.xlsx')

all_sheets = wb.sheetnames

keep_sheets = ['Sheet1','Sheet2','Sheet3']

for sheetName in all_sheets:
    if sheetName not in keep_sheets:
        del wb[sheetName]
wb.save(f'{filepath_updated}File1 - {curr_period}-Updated.xlsx')

wb = openpyxl.load_workbook('Book1.xlsx')

keep_sheets2 = ['Sheet4','Sheet5','Sheet6']
all_sheets = wb.sheetnames

for sheetName in all_sheets:
    if sheetName not in keep_sheets2:
        del wb[sheetName]
wb.save(f'{filepath_updated}File2 - {curr_period}-Updated.xlsx')

wb = openpyxl.load_workbook('Book1.xlsx')

keep_sheets3 = ['Sheet7','Sheet8','Sheet9']
all_sheets = wb.sheetnames

for sheetName in all_sheets:
    if sheetName not in keep_sheets3:
        del wb[sheetName]
wb.save(f'{filepath_updated}File3 - {curr_period}-Updated.xlsx')
bbalch
  • 33
  • 1
  • 2
  • 8
  • There's [another thread](https://stackoverflow.com/questions/42344041/copy-worksheet-from-one-workbook-to-another-one-using-openpyxl) that discussed this and it appears there's no particularly good options with openpyxl other than what you're doing, or copying all the cells. – sj95126 Aug 19 '21 at 22:59
  • What's wrong with read-only mode if you're creating new workbooks? – Charlie Clark Aug 20 '21 at 09:07
  • I did try to the read-only mode and it does load much more quickly. However, I received a "TyperError: Workbook is read-only" when the code gets to wb.save – bbalch Aug 20 '21 at 13:40

1 Answers1

0

An idea would be to open the workbook and copy sheets from one workbook to another. However, as per the docs at openpyxl.workbook.workbook module you can only use copy_worksheet(from_worksheet) function within the workbook that they belong to.

Therefore, maybe we can read the file in once as as binary file and then perhaps use a function to create a new work book from the single binary file (e.g., load_workbook(b_file)). This would avoid loading the Excel file over and over.

The code below:

  1. Loads an Excel file (it has multiple sheets for example: ['Instructions', 'SalesPivot', 'Sales Data', 'MyLinks']). The file is available to download from here. It has formatting and filters or slicers on a different sheet to the main data sheet.
  2. Creates two lists from the list of sheetnames (a list with names mentioning 'Sales' or 'sales' and a list of sheets not mentioning 'Sales' or 'sales'. Just something for the demo. The lists could be manually created etc.
  3. Calls a function createSubWb(source_wb, l) that takes the original (main) workbook with all the sheets and a list that contains the sheet names to keep and returns a new workbook that you can save as required.

In the example code there is also a function to generate a timestamp for use in the out file name. It may not be required - change as required.

Code:

from openpyxl import load_workbook
from openpyxl import Workbook
import sys
import datetime as dt

# sample file from:
# https://www.contextures.com/excelpivottableslicers.html#download
# https://www.contextures.com/pivotsamples/regionsalesslicer.zip


# set filename/path etc.
fname = 'regionsalesslicer.xlsx'

# attempt to open the file
try:
    b_file = open(fname, 'rb')
except IOError:
    try:
        print('Could not open/read file:', fname)
        sys.exit(1)
    except SystemExit:
        print('Exit')

# function that takes a source workbook and a list of sheet names to keep
def createSubWb(source_wb, l):
    wb = load_workbook(b_file)
    for sheet in source_wb.sheetnames:
        if sheet not in l:
            del wb[sheet]
    return wb

# function used in the output of the file to get
# give today is August 20 2021 it would output
# '2021-07-31___2021-08-31' (end if last month to end of current)
def getTimePeriod():
    d = dt.date.today()
    s = (str(d.replace(day=1) - dt.timedelta(days=1))
        + '___'
        + str(dt.date(d.year + d.month // 12, 
              d.month % 12 + 1, 1) - dt.timedelta(1))
        )
    return s



# sheetnames contained, in this example:
# ['Instructions', 'SalesPivot', 'Sales Data', 'MyLinks']
main_wb = load_workbook(b_file)

# create some lists
# for example just get 'SalesPivot' and 'Sales Data' names into a list to keep
keep = [sheet for sheet in main_wb.sheetnames if 'sales' in sheet.lower()]
# get anything else into another list
keep2 = [sheet for sheet in main_wb.sheetnames if 'sales' not in sheet.lower()]

# call the function with same main_wb but different lists
createSubWb(main_wb, keep).save('keep_sales_sheets' + getTimePeriod() + '.xlsx')
createSubWb(main_wb, keep2).save('keep_NON_sales_sheets' + getTimePeriod() + '.xlsx')

Results:

Two new files are created from the original file. The sample (demo) file has formatting and slicers and are working.

enter image description here

MDR
  • 2,610
  • 1
  • 8
  • 18