1

I have a 2D numarray, of size WIDTHxHEIGHT. I would like to bin the array by finding the median of each bin so that the resultant array is WIDTH/binsize x HEIGHT/binsize. Assume that both WIDTH and HEIGHT are divisible by binsize. Edit: An example is given in the attached image.

I have found solutions where the binned array values are the sum or average of the individual elements in each bin: How to bin a 2D array in numpy?

However, if I want to do a median combine of elements in each bin, I haven't been able to figure out a solution. Your help would be much appreciated!

Edit: image added An example of the initial array and desired resultant median binned array

Community
  • 1
  • 1
user3835290
  • 176
  • 1
  • 8

2 Answers2

2

So you are looking for median over strided reshape:

import numpy as np
a = np.arange(24).reshape(4,6)

def median_binner(a,bin_x,bin_y):
    m,n = np.shape(a)
    strided_reshape = np.lib.stride_tricks.as_strided(a,shape=(bin_x,bin_y,m//bin_x,n//bin_y),strides = a.itemsize*np.array([(m / bin_x) * n, (n / bin_y), n, 1]))
    return np.array([np.median(col) for row in strided_reshape for col in row]).reshape(bin_x,bin_y)



print "Original Matrix:"
print a
print "\n"
bin_tester1 = median_binner(a,2,3)
print "2x3 median bin :"
print bin_tester1
print "\n"
bin_tester2 = median_binner(a,2,2)
print "2x2 median bin :"
print bin_tester2

result:

Original Matrix:
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]


2x3 median bin :
[[  3.5   5.5   7.5]
 [ 15.5  17.5  19.5]]


2x2 median bin :
[[  4.   7.]
 [ 16.  19.]]

Read this in order to completely understand the following line in the code:

strided_reshape = np.lib.stride_tricks.as_strided(a,shape=(bin_x,bin_y,m//bin_x,n//bin_y),strides = a.itemsize*np.array([(m / bin_x) * n, (n / bin_y), n, 1])) .

Kennet Celeste
  • 4,593
  • 3
  • 25
  • 34
  • Thanks! This isn't quite what I was looking for, I made a diagram to make it more clear in my question above. – user3835290 Oct 21 '16 at 21:35
  • Here you have median binned 0,1,2,3 then 4,5,6,7 then 8,9,10,11 etc. I want to bin 0,1,6,7 then 2,3,8,9 then 4,5,10,11 etc. – user3835290 Oct 21 '16 at 21:38
  • @user3835290 I know understood your problem and fixed the code based on what you asked. Please "accept" the answer like [here](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) so that other people can see find it as well. – Kennet Celeste Oct 22 '16 at 22:03
  • This is a very nice solution! Obviously you can change median for any function you wish to manipulate col array, which is (bin_x, bin_y). I still have hard time completely understand these stride tricks, but this https://towardsdatascience.com/advanced-numpy-master-stride-tricks-with-25-illustrated-exercises-923a9393ab20?gi=9b2e8492685c is a good start. It is probably unsafe to run it on arrays with non-divisible number of elements so it has to be checked or transformed first. – VojtaK Aug 31 '22 at 15:04
  • I just would like to add that bin_x in median_binner function is a dimension of the new array that used to have m elements. When this corresponds to the matrix rows, I would rather denote it with y. So if the bin_x and bin_y would be the sizes of the box to bin relative to n, m, fix would be bin_m=m//bin_y, bin_n=n//bin_x and use // in the strides argument, because otherwise current numpy will object about having float64 argument where it expects integer. – VojtaK Aug 31 '22 at 16:21
0

I was dealing with the same issue. I have found the answer of Kennet Celeste very useful but there are some caveats. First the stride reshape is fast but the loop then is slow. The trick is to get all the data you compute median from to the same location in the memory and use somehow vectorized numpy operation.

If you don't want to fiddle with the stride reshape you can go for np.swapaxes function. So let's say I have an array X of the size xdim x ydim and want to bin it by window bin_x x bin_y

import numpy as np
#Some sample values
xdim= 5039
ydim = 6637
bin_x = 5
bin_y = 7
X = np.random.rand(ydim, xdim)
#now compute reduced dimensions so that bin_x divides xdim_red
xdim_red = xdim - xdim % bin_x
ydim_red = ydim - ydim % bin_y
#and dimensions after binning
xdim_bin = xdim_red // bin_x
ydim_bin = ydim_red // bin_y
#crop X to the end of the indices
X = X[0:ydim_red, 0:xdim_red]
#Here alternative to stride reshape
X.shape = (ydim_bin, bin_y, xdim_bin, bin_x)
X_reshaped = X.swapaxes(1, 2)
#The following can be done on stride_reshape array as well and finally joins the chunks of the memory we need to get together  
X_reshaped = X_reshaped.reshape((ydim_bin, xdim_bin, bin_x*bin_y))
#There could be faster implementation but this at least use batc
g = np.median(X_reshaped, axis=-1)
VojtaK
  • 483
  • 4
  • 13