1

I have a classified raster that I am reading into a numpy array. (n classes)

I want to use a 2d moving window (e.g. 3 by 3) to create a n-dimensional vector that stores the %cover of each class within the window. Because the raster is large it would be useful to store this information so as not to re-compute it each time....therefor I think the best solution is creating a 3d array to act as the vector. A new raster will be created based on these %/count values.

My idea is to:

1) create a 3d array n+1 'bands'

2) band 1 = the original classified raster. each other 'band' value = count cells of a value within the window (i.e. one band per class) ....for example:

[[2 0 1 2 1]
 [2 0 2 0 0]
 [0 1 1 2 1]
 [0 2 2 1 1]
 [0 1 2 1 1]]
[[2 2 3 2 2]
 [3 3 3 2 2]
 [3 3 2 2 2]
 [3 3 0 0 0]
 [2 2 0 0 0]]
[[0 1 1 2 1]
 [1 3 3 4 2]
 [1 2 3 4 3]
 [2 3 5 6 5]
 [1 1 3 4 4]]
[[2 3 2 2 1]
 [2 3 3 3 2]
 [2 4 4 3 1]
 [1 3 5 3 1]
 [1 3 3 2 0]]

4) read these bands into a vrt so only needs be created the once ...and can be read in for further modules.

Question: what is the most efficient 'moving window' method to 'count' within the window?

Currently - I am trying, and failing with the following code:

def lcc_binary_vrt(raster, dim, bands):
    footprint = np.zeros(shape = (dim,dim), dtype = int)+1
    g = gdal.Open(raster)
    data = gdal_array.DatasetReadAsArray(g)

    #loop through the band values
    for i in bands:   
        print i
        # create a duplicate '0' array of the raster
        a_band = data*0
        # we create the binary dataset for the band        
        a_band = np.where(data == i, 1, a_band)
        count_a_band_fname = raster[:-4] + '_' + str(i) + '.tif'        
        # run the moving window (footprint) accross the band to create a 'count'
        count_a_band = ndimage.generic_filter(a_band, np.count_nonzero(x), footprint=footprint, mode = 'constant')
        geoTiff.create(count_a_band_fname, g, data, count_a_band, gdal.GDT_Byte, np.nan)

Any suggestions very much appreciated.

Becky

user3770062
  • 153
  • 1
  • 1
  • 12
  • I haven't begun coding this module at the moment - I am reading around all of the options and requesting advice beforehand (the only part I request advice on is how to efficiently calculate % cover of a value in a 2d moving window). ...the best option I can see at the moment is to use scipy.ndimage.measurements.histogram somehow... – user3770062 Nov 07 '14 at 08:50
  • Until you have some sort of semi-working code, your question is probably not ready for SO. – John Zwinck Nov 07 '14 at 12:06

1 Answers1

1

I don't know anything about the spatial sciences stuff, so I'll just focus on the main question :)

what is the most efficient 'moving window' method to 'count' within the window?

A common way to do moving window statistics with Numpy is to use numpy.lib.stride_tricks.as_strided, see for example this answer. Basically, the idea is to make an array containing all the windows, without any increase in memory usage:

from numpy.lib.stride_tricks import as_strided

...

m, n = a_band.shape
newshape = (m-dim+1, n-dim+1, dim, dim)
newstrides = a_band.strides * 2  # strides is a tuple
count_a_band = as_strided(ar, newshape, newstrides).sum(axis=(2,3))

However, for your use case this method is inefficient, because you're summing the same numbers over and over again, especially if the window size increases. A better way is to use a cumsum trick, like in this answer:

def windowed_sum_1d(ar, ws, axis=None):

    if axis is None:
        ar = ar.ravel()
    else:
        ar = np.swapaxes(ar, axis, 0)

    ans = np.cumsum(ar, axis=0)
    ans[ws:] = ans[ws:] - ans[:-ws]

    ans = ans[ws-1:]

    if axis:
        ans = np.swapaxes(ans, 0, axis)

    return ans


def windowed_sum(ar, ws):
    for axis in range(ar.ndim):
        ar = windowed_sum_1d(ar, ws, axis)
    return ar

...

count_a_band = windowed_sum(a_band, dim)

Note that in both codes above it would be tedious to handle edge cases. Luckily, there is an easy way to include these and get the same efficiency as the second code:

count_a_band = ndimage.uniform_filter(a_band, size=dim, mode='constant') * dim**2

Though very similar to what you already had, this will be much faster! The downside is that you may need to round to integers to get rid of floating point rounding errors.

As a final note, your code

# create a duplicate '0' array of the raster
a_band = data*0
# we create the binary dataset for the band        
a_band = np.where(data == i, 1, a_band)

is a bit redundant: You can just use a_band = (data == i).

Community
  • 1
  • 1