I have a relatively large excel file (.xlsx) with one sheet that contains over 100k+ rows spanning over 350+ columns, totaling 83 MB in file size.
I use pandas
method read_excel()
to load the file up, but it takes on average almost 5 minutes to get this all done and eats up over 800 MB in memory.
excel_file = '/path/to/an_excel_file'
try:
data = pd.read_excel(excel_path, engine='xlrd')
process_data_further(data)
except FileNotFoundError:
sys.exit(1)
As said above, this works, but I find it relatively slow and inefficient.
Any idea how to optimize the import of the file?