this is a previous question where to improve the time performance of a function in python i need to find an efficient way to split my text file
I have the following text file (more than 32 GB) not sorted
....................
0 274 593869.99 6734999.96 121.83 1,
0 273 593869.51 6734999.92 121.57 1,
0 273 593869.15 6734999.89 121.57 1,
0 273 593868.79 6734999.86 121.65 1,
0 272 593868.44 6734999.84 121.65 1,
0 273 593869.00 6734999.94 124.21 1,
0 273 593868.68 6734999.92 124.32 1,
0 274 593868.39 6734999.90 124.44 1,
0 275 593866.94 6734999.71 121.37 1,
0 273 593868.73 6734999.99 127.28 1,
.............................
the first and second columns are the ID (ex: 0 -273) of location of the x,y,z point in a grid.
def point_grid_id(x,y,minx,maxy,distx,disty):
"""give id (row,col)"""
col = int((x - minx)/distx)
row = int((maxy - y)/disty)
return (row, col)
the (minx, maxx)
is the origin of my grid with size distx,disty
. The the numbers of Id tiles are
tiles_id = [j for j in np.ndindex(ny, nx)] #ny = number of row, nx= number of columns
from [(0,0),(0,1),(0,2),...,(ny-1,nx-1)]
n = len(tiles_id)
I need to slice the ~32 GB file in n (= len(tiles_id))
numbers of files.
i can do this without sorting but reading n times the file. For this reason I wish to find an efficient splitting method for the file starting form (0,0) (= tiles_id[0])
. After that i can read only one time the splitted files.