2

I have a relatively large excel file (.xlsx) with one sheet that contains over 100k+ rows spanning over 350+ columns, totaling 83 MB in file size.

I use pandas method read_excel() to load the file up, but it takes on average almost 5 minutes to get this all done and eats up over 800 MB in memory.

excel_file = '/path/to/an_excel_file'
try:
    data = pd.read_excel(excel_path, engine='xlrd')
    process_data_further(data)
except FileNotFoundError:
    sys.exit(1)

As said above, this works, but I find it relatively slow and inefficient.

Any idea how to optimize the import of the file?

baduker
  • 19,152
  • 9
  • 33
  • 56
  • data already is a dataframe. why do you append it to an empty dataframe? – luigigi Dec 04 '19 at 11:22
  • 1
    see https://stackoverflow.com/questions/38623368/reading-a-portion-of-a-large-xlsx-file-with-python/38623545 and https://stackoverflow.com/questions/47455562/loading-excel-file-chunk-by-chunk-with-python-instead-of-loading-full-file-into/47455961#47455961 – David Dec 04 '19 at 11:27

1 Answers1

0

Edit : I am sorry, I read it differently. You want to read XLSX.

Faster way to read Excel files to pandas dataframe https://www.giacomodebidda.com/reading-large-excel-files-with-pandas/

I hope this might be helpful.

*************************** Below is used for reading CSV's****************

you can use chunksize parameter while reading it. Apart from Pandas, you can use Dask, Koalas, Voex and Modin to speedup the process. I personally prefer VOEX as it uses available RAM on your laptop.

refer to the link here for chunksize

Syenix
  • 208
  • 2
  • 9
  • The chunksize parameter has been deprecated and it's not used by pd.read_excel(), because of the nature of XLSX file format, which will be read up into memory as a whole during parsing. – baduker Dec 04 '19 at 11:32
  • Yes, You are right. I read it differently. I usually read my XLS as CSV instead of XLS call in pandas because they are just 1 sheet saved as XLS. Apologies for the misleading answer. – Syenix Dec 04 '19 at 11:38