I have a lot of reports that I want to compile into a single dataframe in python.
This code works to loop through my directory and read all of the report files where the sheet name is the same in every file... I have many sheets in each workbook but only want to find the sheet_names that contain a specific string, 'Report'.
import pandas as pd
from pathlib import Path
import os
import glob
pathsting= 'path/to/working/directory'
rootdir = Path(pathsting)
onlydirs = [f for f in os.listdir(rootdir) if os.path.isdir(os.path.join(rootdir, f))]
df0 = pd.DataFrame()
for direct in onlydirs:
print(direct)
dirpathstring = pathsting + '\\' + direct
dirpath = Path(dirpathstring)
onlyfiles = [f for f in os.listdir(dirpath) if os.path.isfile(os.path.join(dirpath, f))]
for f in dirpath.glob("*Report.xlsm"):
print(f.name)
temp = pd.read_excel(f, sheet_name='Report')
df0 = pd.concat([df0, temp])
display(df0)
Now suppose that over time the report changes formatting and instead of sheet_name='Report'
it becomes sheet_name='XYZ Report'
. I have many reports and the name changes a few times. I do not want to hard code all possible report names in multiple different loops.
I was able to use glob to read all files that end in 'Report.xlsm', but is there a similar method to read sheet_names that contain the text 'Report' instead of the exact string?