How to coarser the 2-d array data resolution

Question

I have a similar question here
That question is about resemble the tiff data, and I want to find a more universal way to deal with it.

My question

For example:

A 2-d numpy array with the shape of 200x150 represent the 1 km x 1 km resolution population density data.

http://i4.tietuku.com/060fe38e9ccf6daf.png

My target: change the space resolution => 5 km x 5 km resolution

this is an example picturefor random distributed data cluster into grid network http://i4.tietuku.com/a4c1355413f8cdbc.png
* the red point: original data
* the blue dot: grid network represent the 2-d array * the green circle: find the nearest blue dot for each red point and sum them.
* In this question, the difference is that the original data is 2-d numpy array too.

My solution

Similar with my another question here which I cluster 2-d scatter point to nearest grid point. And I appreciate the answer supported by @HYRY which improved my code a lot.
In that question, I use KD-tree algorithm to find the nearest network node of each original point data. The result shows here:
http://i4.tietuku.com/1a420c48ed7bcb1c.png
I think there must be some easier way to reshape the structured 2-d numpy array rather than random 2-d scatter point.

Add 2016-01-09

Thanks for the answer from @Praveen.
I have another method using scipy interpolate 2d function.

This is my code:

 xi  = np.linspace(x_map1,x_map2,pop.shape[1])
 yi  = np.linspace(y_map1,y_map2,pop.shape[0])
 hfunc = interpolate.interp2d(xi,yi,pop)

 x_grid  = np.linspace(x_map1,x_map2,new_shape_x)
 y_grid  = np.linspace(y_map1,y_map2,new_shape_y)

 new_pop = np.zeros(new_shape_x * new_shape_y)
 t = 0
 for i in range(0,new_shape_y,1):
     for j in range(0,new_shape_y,1):
         new_pop[t] = hfunc(x_grid[j],y_grid[i])
         t+=1
 new_pop = new_pop.reshape(new_shape_y,new_shape_x)
 plt.pcolormesh(new_pop)

The result shows like:
http://i4.tietuku.com/b020db6dc2d75d70.png

Are there some problems when I use interpolate to coarser the data?

Add 2

Is there some useful function that I can sample some data from origin array dataset by location(x,y)?

So you want to [subsample](http://stackoverflow.com/questions/13242382/resampling-a-numpy-array-representing-an-image) the image? — Praveen, Jan 09 '16 at 04:11
I don't want to extract subset of the data. I think it's more like resampling the data without losing information. I'll explain more in my question. — Han Zhengzu, Jan 09 '16 at 04:46
You should ask your interpolate question separately. It's hard to answer a question if it keeps changing. — Praveen, Jan 11 '16 at 11:37

score 9 · Accepted Answer · edited Oct 09 '19 at 17:48

If I understood you correctly, you have a very fine population density map, which you're trying to make coarse, by aggregating population densities within every 5x5 pixel zone. Is that right?

So when you say you're trying to make 1km x 1km into 5km x 5km you mean that each pixel currently represents population in a 1km x 1km region, whereas you want to make it represent population in a 5km x 5km region.

If so, please don't use clustering and KD-trees! That would be a horribly inefficient way to do something far simpler.

This is probably what you want. To explain:

# Suppose the 2D array is pop_density
coarseness = 5
temp = pop_density.reshape((pop_density.shape[0] // coarseness, coarseness,
                            pop_density.shape[1] // coarseness, coarseness))
coarse_pop_density = np.sum(temp, axis=(1,3))

As stated in the other answer, this will only work if the shape of pop_density is an exact multiple of coarseness. I believe this is the case for you, since you say you have a 200x150 image, which you're trying to make coarse by a factor of 5.

For images that are not a multiple of the coarseness factor

# Suppose the size of pop_density was 198x147 instead of 200x150.
# Start by finding the next highest multiple of 5x5
shape = np.array(pop_density.shape, dtype=float)
new_shape = coarseness * np.ceil(shape / coarseness).astype(int)
# new_shape is now (200, 150)

# Create the zero-padded array and assign it with the old density
zp_pop_density = np.zeros(new_shape)
zp_pop_density[:shape[0], :shape[1]] = pop_density

# Now use the same method as before
temp = zp_pop_density.reshape((new_shape[0] // coarseness, coarseness,
                               new_shape[1] // coarseness, coarseness))
coarse_pop_density = np.sum(temp, axis=(1,3))

Thanks! So this method can't be used when the original shape is not aliquant by the coarseness factor? — Han Zhengzu, Jan 09 '16 at 05:47
You wouldn't be able to do it in such a neat way. You would have to handle the edge-cases separately. I think I would probably do it by zero-padding. I'll amend the answer in a minute. — Praveen, Jan 09 '16 at 05:48
@AlecMcGail I think that if the original shape is divisible by the coarseness factor, then you could probably take `np.mean` instead of `np.sum` and you should be fine (assuming that you are averaging by area, i.e.). If not, you would have to do something different for the edge-cases, since zero-padding will affect the average. — Praveen, Nov 08 '20 at 20:48