I have a series of large (and poorly formatted) excel spreadsheets that I am trying to process with pandas. Each excel file contains 50-60 sheets, and I am only interested in a subset of the sheets, within each file.
I have tried to read the entire spreadsheet as an pd.ExcelFile
object, so I can use the sheet_names
attribute to parse particular sheets (and I don't know the names of each sheet ahead of time). This works - but seems exceptionally slow (close to a minute for each ~30mb excel file).
I can only assume this is because each sheet is being parsed as the pd.ExcelFile
object is being initialised (...could be wrong?). If so, is there a way to prevent this behaviour? - I really only want to get the sheet names, and then parse the specific sheets from there.
Thanks in advance!