I want to loop through many Excel files in a single folder and pull information contained in certain tabs only if those tabs contain a certain string value. So for example, one spreadsheet may have 20 tabs, but I only want the tab (and it's information) that contains the string "Apples" somewhere in that tab. (It looks like it is always located in the first row). I then want to aggregate all of these tabs into one spreadsheet. This problem is unique from previous SO questions because my tabs are not uniformly named. Sometimes, the tab I want is called "Apple Sauce" and other times it's called "Apple Jacks". This is why I need to look in the tab itself for my string and I can't rely on just specifying the sheet name.
I have written the following code so far:
import pandas as pd
import os
root = r"my_dir"
agg_df = pd.DataFrame()
for directory, subdirectory, files in os.walk(root):
for file in files:
if file.endswith('.xlsm'):
filepath = os.path.join(directory, file)
# I want to do some kind of if statement here maybe to say if sheet_name.contains("Apples")
df_temp = pd.read_excel(filepath)
df_temp['Filepath'] = filepath
agg_df = agg_df.append(df_temp)