0

I am trying to load a large .xlsx file using OpenPyXL, while loading a 80 MB .xlsx file my entire 8 GB of memory is getting full

enter image description here

Excel file has 4 sheets with 800 000 rows.

from openpyxl import load_workbook
wb = load_workbook('Record.xlsx')

def process(ws):
    '''
    Read all rows of a worksheet
    '''
    data = []
    for a, c, f, k in zip(ws['A'], ws['C'], ws['F'], ws['K']):
        data.append([a, c, f, k])

    return data

ws1 = wb.worksheets[0] # Sheet 1
data1 = process(ws1)
ws2 = wb.worksheets[1] # Sheet 2
data2 = process(ws2)
ws3 = wb.worksheets[2] # Sheet 3
data3 = process(ws3)
ws4 = wb.worksheets[3] # Sheet 4
data4 = process(ws4)

Why while loading 80 MB of excel file 8 GB of memory is not enough?

James Z
  • 12,209
  • 10
  • 24
  • 44
arush1836
  • 1,327
  • 8
  • 19
  • 37
  • Can you show us your code, it looks like something is causing a memory leak maybe ? – BcK Apr 20 '18 at 11:57
  • is it really that you have only these 2 rows in your code? may be parsing later does give you the issue? I never had problems reading; and for parsing use generators(yield) – Drako Apr 20 '18 at 11:57
  • 1
    Have you tried `read_only = True` to use [read only mode](http://openpyxl.readthedocs.io/en/latest/optimized.html)? – Colin Ricardo Apr 20 '18 at 11:57
  • Possible duplicate of [Fastest Way To Run Through 50k Lines of Excel File in OpenPYXL](https://stackoverflow.com/questions/35638117/fastest-way-to-run-through-50k-lines-of-excel-file-in-openpyxl) – Charlie Clark Apr 20 '18 at 12:08
  • @BcK I have updated the code – arush1836 Apr 20 '18 at 12:45
  • @Colin While setting `read_only = True`, I am getting this error `AttributeError: 'ReadOnlyWorksheet' object has no attribute 'iter_cols'` – arush1836 Apr 20 '18 at 13:18

1 Answers1

2

Try using the read_only = True argument to use read only mode, as described here.

Colin Ricardo
  • 16,488
  • 11
  • 47
  • 80