I have an .xlsb file with 3 sheets and each sheet has 1 million+ rows. I am trying to read this in python. I have tried 3 different methods - using xlwings, pyxlsb and pyodbc respectively to read the files. The fastest among the 3 is with pyxlsb, however it is still overwhelmingly large amount of time taken just to read the file. Its takes approx 47 mins to read the file and process it. I am using the below code. Can anyone suggest a way to speed this up? I have 3 such files and 47 mins each is just too much waste.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = {}
with open_xlsb('Inverter Data 2018.xlsb') as wb:
for i in range(len(wb.sheets)):
df[wb.sheets[i]] = []
with wb.get_sheet(wb.sheets[i]) as sheet:
for row in sheet.rows():
df[wb.sheets[i]].append([item.v for item in row])
df[wb.sheets[i]] = pd.DataFrame(df[wb.sheets[i]][1:], columns=df[wb.sheets[i]][0])