What are the Python3 options to efficiently (performance and memory) extract sheet names and for a given sheet, and also column names from a very large .xlsx file?
I've tried using pandas:
For sheet names using pd.ExcelFile
:
xl = pd.ExcelFile(filename)
return xl.sheet_names
For column names using pd.ExcelFile
:
xl = pd.ExcelFile(filename)
df = xl.parse(sheetname, nrows=2, **kwargs)
df.columns
For column names using pd.read_excel
with and without nrows
(>v23):
df = pd.read_excel(io=filename, sheet_name=sheetname, nrows=2)
df.columns
However, both pd.ExcelFile
and and pd.read_excel
seem to read the entire .xlsx in memory and are therefore slow.
Thanks a lot!