I have a huge file (around 30GB), each line includes coordination of a point on a 2D surface. I need to load the file into Numpy array: points = np.empty((0, 2))
, and apply scipy.spatial.ConvexHull
over it. Since the size of the file is very large I couldn't load it at once into the memory, I want to load it as batch of N lines and apply scipy.spatial.ConvexHull
on the small part and then load the next N rows! What's an efficient to do it?
I found out that in python you can use islice
to read N lines of a file but the problem is lines_gen
is a generator object, which gives you each line of the file and should be used in a loop, so I am not sure how can I convert the lines_gen
into Numpy array in an efficient way?
from itertools import islice
with open(input, 'r') as infile:
lines_gen = islice(infile, N)
My input file:
0.989703 1
0 0
0.0102975 0
0.0102975 0
1 1
0.989703 1
1 1
0 0
0.0102975 0
0.989703 1
0.979405 1
0 0
0.020595 0
0.020595 0
1 1
0.979405 1
1 1
0 0
0.020595 0
0.979405 1
0.969108 1
...
...
...
0 0
0.0308924 0
0.0308924 0
1 1
0.969108 1
1 1
0 0
0.0308924 0
0.969108 1
0.95881 1
0 0