3

I have xyz textfiles that need to be gridded. For each xyz file i have info about the origin coordinates the cellsize and the number of rows/columns. however, records where there´s no z value are missing in the xyz file so just creating a grid from the present records fails because of the missing values. so i tried this:

nxyz = np.loadtxt(infile,delimiter=",",skiprows=1)

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25

grid = np.zeros((nrows,ncols))

for item in nxyz:
    idx = (item[0]-xllcorner)/cellsize
    idy = (item[1]-yllcorner)/cellsize
    grid[idy,idx] = item[2]

outfile = open(r"e:\test\myrasout.txt","w")
np.savetxt(outfile,grid[::-1], fmt="%.2f",delimiter= " ")
outfile.close()

This gets me the grid with zeroes where no records are present in the xyz file. It works for smaller files but i got an out of memory error for a file with 290Mb size (~8900000 records). And this is not the largest file i have to process.

So i tried another (iterative) approach by Joe Kington i found here for loading the xyz file. This worked for the 290MB file, but failed with an out of memory error on the next bigger one (533MB, ~15600000 records).

How can i grid these larger files correctly (accounting for the missing records) without running out of memory?

Community
  • 1
  • 1
rr5577
  • 175
  • 1
  • 5
  • How about using a [generator](http://wiki.python.org/moin/Generators)? The for loop holds all elements in memory whereas a generator will only memorize the current element it processes. – LarsVegas Oct 03 '12 at 12:35
  • can you provide the source of np.* functions ? – Tommaso Barbugli Oct 03 '12 at 12:40
  • @larsvegas do you mean that grid dictonary holds the results in memory or something else ? – Tommaso Barbugli Oct 03 '12 at 12:44
  • @larsvegas, my code doesn´t even get to the for loop, it runs out of memory at the np.loadtxt already. if i´m not mistaken (python amateur) the iterative approach i linked to and used in the second try uses a generator. – rr5577 Oct 03 '12 at 12:53
  • As far as I understand, if using a for loop all items are stored in memory **until** the for loop is completed. A generator frees an item once it was processed. – LarsVegas Oct 03 '12 at 12:53
  • @Tommasso do you mean this? [np.loadtxt](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html), [np.zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html), [np.savetxt](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html) – rr5577 Oct 03 '12 at 12:58
  • 1
    Just use `readline()` then to not read the whole file into memory first. – LarsVegas Oct 03 '12 at 12:59

2 Answers2

2

Based on the comments I'd change the code to

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25
grid = np.zeros((nrows,ncols))

with open(file) as f:
    for line in f:
        item = line.split() # fill with whatever is separating the values 
        idx = (item[0]-xllcorner)/cellsize
        idy = (item[1]-yllcorner)/cellsize
        #...
LarsVegas
  • 6,522
  • 10
  • 43
  • 67
1

You can do fancy indexing with NumPy. Try using something like this, instead of the loop which is probably the root of yuor problem:

grid = np.zeros((nrows,ncols))
grid[nxyz[:,0],nxyz[:,1]] = nxyz[:,2]

With the origin and cell size conversion, it is a bit more involved:

grid = np.zeros((nrows,ncols))
grid[(nxyz[:,0]-x11corner)/cellsize,(nxyz[:,1]-y11corner)/cellsize] = nxyz[:,2]

If this doesn't help, the nxyz array is too big, but I doubt that. If it is, then you could load the text file in several parts and do the above for each part sequentially.

P.S. You probably know the range of the data contained in your text files, and you can limit memory usage by explicitely stating this while loading the file. Like so if you are dealing with maximally 16 bit integers: np.loadtxt("myfile.txt", dtype=int16).

Karol
  • 1,246
  • 2
  • 13
  • 20