Reading Large xlsb files faster in python

Question

I have an .xlsb file with 3 sheets and each sheet has 1 million+ rows. I am trying to read this in python. I have tried 3 different methods - using xlwings, pyxlsb and pyodbc respectively to read the files. The fastest among the 3 is with pyxlsb, however it is still overwhelmingly large amount of time taken just to read the file. Its takes approx 47 mins to read the file and process it. I am using the below code. Can anyone suggest a way to speed this up? I have 3 such files and 47 mins each is just too much waste.

import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = {}
with open_xlsb('Inverter Data 2018.xlsb') as wb:
    for i in range(len(wb.sheets)):
        df[wb.sheets[i]] = []
        with wb.get_sheet(wb.sheets[i]) as sheet:
            for row in sheet.rows():
                df[wb.sheets[i]].append([item.v for item in row])
        df[wb.sheets[i]] = pd.DataFrame(df[wb.sheets[i]][1:], columns=df[wb.sheets[i]][0])

Does this answer your question? [Faster way to read Excel files to pandas dataframe](https://stackoverflow.com/questions/28766133/faster-way-to-read-excel-files-to-pandas-dataframe) — think-maths, Jan 29 '21 at 10:42
sounds interesting, but haven't tried.. I have moved on to a different project now, but this is worth a try. I will let you know what I find out. thanks so much :-) — Mridul Bhardwaj, Mar 18 '21 at 13:01

score -2 · Answer 1 · edited Jan 29 '21 at 19:38

-2

Can you try the below code:

import pandas as pd
df = pd.read_excel('Inverter Data 2018.xlsb', engine='pyxlsb', sheet_name='yourSheetName')

edited Jan 29 '21 at 19:38

Mubashar Javed

1,197
1
9
17

answered Jan 29 '21 at 10:17

Ameer

1

yes, I tried that too, but that doesn't reduce the time taken. thanks – Mridul Bhardwaj Mar 18 '21 at 12:55

Reading Large xlsb files faster in python

1 Answers1