Yes, it's absolutely possible. That is indeed a lot of data to be in an Excel file. By default, xlrd
loads the entire workbook into memory. If your workbook is a .xls file, you can use the on_demand
parameter to only open worksheets as they are needed:
import xlrd
def processExcel(excelFile):
excelData = xlrd.open_workbook(excelFile, on_demand=True)
sheets = excelData.sheet_names()
print sheets
If you are trying to open a .xlsx file, the on_demand
parameter has no effect.
Update
If you are using Python 3 and reading a .xlsx file, you can try sxl. This is a library which only reads things into memory as needed. So just opening the workbook to retrieve the worksheet names is very quick. Also, if you just need the first few rows of a worksheet, it can get those rather quickly as well.
If you need to read all the data with sxl
, you have to iterate over all the rows, which could be even slower than xlrd
, but at least will only use up as much memory as you need. For example, the following code will only keep one row in memory at any given time:
from sxl import Workbook
wb = Workbook('MyBigFile.xlsx')
ws = wb.sheets[1]
for row in ws.rows:
print(row)
However, if you need random access to all the rows to do your processing, you'll have to keep them all in memory:
from sxl import Workbook
wb = Workbook('MyBigFile.xlsx')
ws = wb.sheets[1]
all_rows = list(ws.rows)
In this case, all_rows
keeps the entire sheet in memory. If your workbook has multiple sheets, this may still be more efficient than xlrd
. But if you need your whole workbook in memory, then you might as well stick to xlrd
.