Speed up importing a big excel file to a pandas data frame

Question

I have a relatively large excel file (.xlsx) with one sheet that contains over 100k+ rows spanning over 350+ columns, totaling 83 MB in file size.

I use pandas method read_excel() to load the file up, but it takes on average almost 5 minutes to get this all done and eats up over 800 MB in memory.

excel_file = '/path/to/an_excel_file'
try:
    data = pd.read_excel(excel_path, engine='xlrd')
    process_data_further(data)
except FileNotFoundError:
    sys.exit(1)

As said above, this works, but I find it relatively slow and inefficient.

Any idea how to optimize the import of the file?

data already is a dataframe. why do you append it to an empty dataframe? — luigigi, Dec 04 '19 at 11:22
see https://stackoverflow.com/questions/38623368/reading-a-portion-of-a-large-xlsx-file-with-python/38623545 and https://stackoverflow.com/questions/47455562/loading-excel-file-chunk-by-chunk-with-python-instead-of-loading-full-file-into/47455961#47455961 — David, Dec 04 '19 at 11:27

Syenix · Answer 1 · 2019-12-04T11:36:42.583

0

Edit : I am sorry, I read it differently. You want to read XLSX.

Faster way to read Excel files to pandas dataframe https://www.giacomodebidda.com/reading-large-excel-files-with-pandas/

I hope this might be helpful.

*************************** Below is used for reading CSV's****************

you can use chunksize parameter while reading it. Apart from Pandas, you can use Dask, Koalas, Voex and Modin to speedup the process. I personally prefer VOEX as it uses available RAM on your laptop.

refer to the link here for chunksize

edited Dec 04 '19 at 11:36

answered Dec 04 '19 at 11:25

Syenix

208
2
9

The chunksize parameter has been deprecated and it's not used by pd.read_excel(), because of the nature of XLSX file format, which will be read up into memory as a whole during parsing. – baduker Dec 04 '19 at 11:32
Yes, You are right. I read it differently. I usually read my XLS as CSV instead of XLS call in pandas because they are just 1 sheet saved as XLS. Apologies for the misleading answer. – Syenix Dec 04 '19 at 11:38

Speed up importing a big excel file to a pandas data frame

1 Answers1