I use openpyxl
and numpy
to read large excel files. The code looks like
W = load_workbook(fname, read_only = True)
p = W.worksheets[0]
a=[]
m = p.max_row
n = p.max_column
for row in p.iter_rows():
for k in row:
a.append(k.value)
# convert list a to matrix (for example 5*6)
aa= np.resize(a, [m, n])
for medium sized files (4MB excel file with 16000 rows and 50 columns) they work fine. However for large files (21B with 100000 rows and 50 columns), numpy
fails with memory error. There is memory available on the system.
1- How can I find how much memory it took while resizing to matrix?
2- How can I increase the memory (something like heap size in java)?
Traceback (most recent call last):
File "exread.py", line 26, in <module>
aa= np.resize(a, [m, n])
File "C:\Users\m.naderan\AppData\Local\Programs\Python\Python36\lib\site-packa
ges\numpy\core\fromnumeric.py", line 1121, in resize
a = ravel(a)
File "C:\Users\m.naderan\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\fromnumeric.py", line 1468, in ravel
return asanyarray(a).ravel(order=order)
File "C:\Users\m.naderan\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\numeric.py", line 583, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
MemoryError