Trouble loading large excel file (Python)

Question

I am trying to load a large .xlsx file using OpenPyXL, while loading a 80 MB .xlsx file my entire 8 GB of memory is getting full

Excel file has 4 sheets with 800 000 rows.

from openpyxl import load_workbook
wb = load_workbook('Record.xlsx')

def process(ws):
    '''
    Read all rows of a worksheet
    '''
    data = []
    for a, c, f, k in zip(ws['A'], ws['C'], ws['F'], ws['K']):
        data.append([a, c, f, k])

    return data

ws1 = wb.worksheets[0] # Sheet 1
data1 = process(ws1)
ws2 = wb.worksheets[1] # Sheet 2
data2 = process(ws2)
ws3 = wb.worksheets[2] # Sheet 3
data3 = process(ws3)
ws4 = wb.worksheets[3] # Sheet 4
data4 = process(ws4)

Why while loading 80 MB of excel file 8 GB of memory is not enough?

Can you show us your code, it looks like something is causing a memory leak maybe ? — BcK, Apr 20 '18 at 11:57
is it really that you have only these 2 rows in your code? may be parsing later does give you the issue? I never had problems reading; and for parsing use generators(yield) — Drako, Apr 20 '18 at 11:57
Have you tried `read_only = True` to use [read only mode](http://openpyxl.readthedocs.io/en/latest/optimized.html)? — Colin Ricardo, Apr 20 '18 at 11:57
Possible duplicate of [Fastest Way To Run Through 50k Lines of Excel File in OpenPYXL](https://stackoverflow.com/questions/35638117/fastest-way-to-run-through-50k-lines-of-excel-file-in-openpyxl) — Charlie Clark, Apr 20 '18 at 12:08
@Colin While setting `read_only = True`, I am getting this error `AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'` — arush1836, Apr 20 '18 at 13:18

score 2 · Accepted Answer · answered Apr 20 '18 at 11:58

2

Try using the read_only = True argument to use read only mode, as described here.

answered Apr 20 '18 at 11:58

Colin Ricardo

16,488
11
47
80

While setting `read_only = True`, I am getting this error `AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'` – arush1836 Apr 20 '18 at 12:55
@User Read-only mode doesn't support iterating through columns, only rows. There's a suggested way to do it [here](https://stackoverflow.com/a/47584317/8142697). – Colin Ricardo Apr 20 '18 at 13:24
I used `for row in ws.rows` to access the rows and it worked. Thank you – arush1836 Apr 20 '18 at 17:37
Glad I could help :) – Colin Ricardo Apr 20 '18 at 17:52

Trouble loading large excel file (Python)

1 Answers1