An idea would be to open the workbook and copy sheets from one workbook to another. However, as per the docs at openpyxl.workbook.workbook module you can only use copy_worksheet(from_worksheet)
function within the workbook that they belong to.
Therefore, maybe we can read the file in once as as binary file and then perhaps use a function to create a new work book from the single binary file (e.g., load_workbook(b_file)
). This would avoid loading the Excel file over and over.
The code below:
- Loads an Excel file (it has multiple sheets for example:
['Instructions', 'SalesPivot', 'Sales Data', 'MyLinks']
). The file is available to download from here. It has formatting and filters or slicers on a different sheet to the main data sheet.
- Creates two lists from the list of sheetnames (a list with names mentioning 'Sales' or 'sales' and a list of sheets not mentioning 'Sales' or 'sales'. Just something for the demo. The lists could be manually created etc.
- Calls a function
createSubWb(source_wb, l)
that takes the original (main) workbook with all the sheets and a list that contains the sheet names to keep and returns a new workbook that you can save as required.
In the example code there is also a function to generate a timestamp for use in the out file name. It may not be required - change as required.
Code:
from openpyxl import load_workbook
from openpyxl import Workbook
import sys
import datetime as dt
# sample file from:
# https://www.contextures.com/excelpivottableslicers.html#download
# https://www.contextures.com/pivotsamples/regionsalesslicer.zip
# set filename/path etc.
fname = 'regionsalesslicer.xlsx'
# attempt to open the file
try:
b_file = open(fname, 'rb')
except IOError:
try:
print('Could not open/read file:', fname)
sys.exit(1)
except SystemExit:
print('Exit')
# function that takes a source workbook and a list of sheet names to keep
def createSubWb(source_wb, l):
wb = load_workbook(b_file)
for sheet in source_wb.sheetnames:
if sheet not in l:
del wb[sheet]
return wb
# function used in the output of the file to get
# give today is August 20 2021 it would output
# '2021-07-31___2021-08-31' (end if last month to end of current)
def getTimePeriod():
d = dt.date.today()
s = (str(d.replace(day=1) - dt.timedelta(days=1))
+ '___'
+ str(dt.date(d.year + d.month // 12,
d.month % 12 + 1, 1) - dt.timedelta(1))
)
return s
# sheetnames contained, in this example:
# ['Instructions', 'SalesPivot', 'Sales Data', 'MyLinks']
main_wb = load_workbook(b_file)
# create some lists
# for example just get 'SalesPivot' and 'Sales Data' names into a list to keep
keep = [sheet for sheet in main_wb.sheetnames if 'sales' in sheet.lower()]
# get anything else into another list
keep2 = [sheet for sheet in main_wb.sheetnames if 'sales' not in sheet.lower()]
# call the function with same main_wb but different lists
createSubWb(main_wb, keep).save('keep_sales_sheets' + getTimePeriod() + '.xlsx')
createSubWb(main_wb, keep2).save('keep_NON_sales_sheets' + getTimePeriod() + '.xlsx')
Results:
Two new files are created from the original file. The sample (demo) file has formatting and slicers and are working.
