9

I'm new to numpy and I have a 2D array of objects that I need to bin into a smaller matrix and then get a count of the number of objects in each bin to make a heatmap. I followed the answer on this thread to create the bins and do the counts for a simple array but I'm not sure how to extend it to 2 dimensions. Here's what I have so far:

data_matrix = numpy.ndarray((500,500),dtype=float)
# fill array with values.

bins = numpy.linspace(0,50,50)
digitized = numpy.digitize(data_matrix, bins)

binned_data = numpy.ndarray((50,50))
for i in range(0,len(bins)):
    for j in range(0,len(bins)):
        k = len(data_matrix[digitized == i:digitized == j]) # <-not does not work
        binned_data[i:j] = k

P.S. the [digitized == i] notation on an array will return an array of binary values. I cannot find documentation on this notation anywhere. A link would be appreciated.

denis
  • 21,378
  • 10
  • 65
  • 88
Mike T
  • 1,163
  • 1
  • 11
  • 27
  • 3
    Can you give a simple example with 3*3 array or similar? I don't understand your purpose. I'm not sure, maybe you want to have a look at `np.histogram2d`? – Syrtis Major Mar 17 '16 at 14:56
  • 1
    Take a look at this [link](https://scipython.com/blog/binning-a-2d-array-in-numpy/) – Yonatan Simson Apr 30 '18 at 10:47

3 Answers3

13

You can reshape the array to a four dimensional array that reflects the desired block structure, and then sum along both axes within each block. Example:

>>> a = np.arange(24).reshape(4, 6)
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape(2, 2, 2, 3).sum(3).sum(1)
array([[ 24,  42],
       [ 96, 114]])

If a has the shape m, n, the reshape should have the form

a.reshape(m_bins, m // m_bins, n_bins, n // n_bins)
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

At first I was also going to suggest that you use np.histogram2d rather than reinventing the wheel, but then I realized that it would be overkill to use that and would need some hacking still.

If I understand correctly, you just want to sum over submatrices of your input. That's pretty easy to brute force: going over your output submatrix and summing up each subblock of your input:

import numpy as np

def submatsum(data,n,m):
    # return a matrix of shape (n,m)
    bs = data.shape[0]//n,data.shape[1]//m  # blocksize averaged over
    return np.reshape(np.array([np.sum(data[k1*bs[0]:(k1+1)*bs[0],k2*bs[1]:(k2+1)*bs[1]]) for k1 in range(n) for k2 in range(m)]),(n,m))

# set up dummy data
N,M = 4,6
data_matrix = np.reshape(np.arange(N*M),(N,M))

# set up size of 2x3-reduced matrix, assume congruity
n,m = N//2,M//3
reduced_matrix = submatsum(data_matrix,n,m)

# check output
print(data_matrix)
print(reduced_matrix)

This prints

print(data_matrix)
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

print(reduced_matrix)
[[ 24  42]
 [ 96 114]]

which is indeed the result for summing up submatrices of shape (2,3).

Note that I'm using // for integer division to make sure it's python3-compatible, but in case of python2 you can just use / for division (due to the numbers involved being integers).

Community
  • 1
  • 1
1

Another solution is to have a look at the binArray function on the comments here: Binning a numpy array

To use your example :

data_matrix = numpy.ndarray((500,500),dtype=float)
binned_data = binArray(data_matrix, 0, 10, 10, np.sum)
binned_data = binArray(binned_data, 1, 10, 10, np.sum)

The result sum all square of size 10x10 in data_matrix (of size 500x500) to obtain a single value per square in binned_data (of size 50x50).

Hope this help !

Community
  • 1
  • 1
Alexandre Kempf
  • 949
  • 7
  • 9