3

I have an .xlsb file with 3 sheets and each sheet has 1 million+ rows. I am trying to read this in python. I have tried 3 different methods - using xlwings, pyxlsb and pyodbc respectively to read the files. The fastest among the 3 is with pyxlsb, however it is still overwhelmingly large amount of time taken just to read the file. Its takes approx 47 mins to read the file and process it. I am using the below code. Can anyone suggest a way to speed this up? I have 3 such files and 47 mins each is just too much waste.

import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = {}
with open_xlsb('Inverter Data 2018.xlsb') as wb:
    for i in range(len(wb.sheets)):
        df[wb.sheets[i]] = []
        with wb.get_sheet(wb.sheets[i]) as sheet:
            for row in sheet.rows():
                df[wb.sheets[i]].append([item.v for item in row])
        df[wb.sheets[i]] = pd.DataFrame(df[wb.sheets[i]][1:], columns=df[wb.sheets[i]][0])
  • Does this answer your question? [Faster way to read Excel files to pandas dataframe](https://stackoverflow.com/questions/28766133/faster-way-to-read-excel-files-to-pandas-dataframe) – think-maths Jan 29 '21 at 10:42
  • sounds interesting, but haven't tried.. I have moved on to a different project now, but this is worth a try. I will let you know what I find out. thanks so much :-) – Mridul Bhardwaj Mar 18 '21 at 13:01

1 Answers1

-2

Can you try the below code:

import pandas as pd
df = pd.read_excel('Inverter Data 2018.xlsb', engine='pyxlsb', sheet_name='yourSheetName')
Mubashar Javed
  • 1,197
  • 1
  • 9
  • 17
Ameer
  • 1