2

I am trying to read an excel sheet into df using pandas read_excel method. The excel file contains 6-7 different sheet. Out of it, 2-3 sheets are very huge. I only want to read one excel sheet out of the file. If I copy the sheet out and read the time reduces by 90%.

I have read that xlrd that is used by pandas always loads the whole sheet to memory. I cannot change the format of the input.

Can you please suggest a way to improve the performance?

Nithin Mohan
  • 182
  • 4
  • 13
  • What about `xlsx = pd.ExcelFile('path_to_file.xls')` and `df = pd.read_excel(xlsx, 'Sheet1')` – jezrael Dec 21 '17 at 10:31
  • This is what we are using currently. It loads all the sheet it seems. [This](https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook) StackOverflow question is the closest related to above question I came across. But it doesn't solve the problem I guess – Nithin Mohan Dec 21 '17 at 10:34
  • What's wrong with `data_file = pd.read_excel('path_to_file.xls', sheetname="Sheet1")`? – SamuelNLP Dec 21 '17 at 12:22
  • see this question and adapt the answer to your problem. https://stackoverflow.com/questions/28766133/faster-way-to-read-excel-files-to-pandas-dataframe – SamuelNLP Dec 21 '17 at 12:26

3 Answers3

0

It's quite simple. Just do this.

import pandas as pd
xls = pd.ExcelFile('C:/users/path_to_your_excel_file/Analysis.xlsx')
df1 = pd.read_excel(xls, 'Sheet1')
print(df1)
# etc.
df2 = pd.read_excel(xls, 'Sheet2')
print(df2)
ASH
  • 20,759
  • 19
  • 87
  • 200
0
import pandas as pd
df = pd.read_excel('YourFile.xlsx', sheet_name = 'YourSheet_Name')

Whatever sheet you want to read just put the sheet name and your path to excel file.

Ashu007
  • 745
  • 1
  • 9
  • 13
-1

Use openpyxl in read-only mode. See http://openpyxl.readthedocs.io/en/default/pandas.html

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55