0

I have a matrix that contains NO2 measurements for a certain part of the globe, along with 2 matrices of the same size that contain the latitudes and longitudes.

NO2 = np.random.rand(100,100)
lat = np.random.rand(100,100)*90.
lon = np.random.rand(100,100)*180

I want to bin these NO2 values based on lat and lon into bins of 0.125 degrees, that look like this:

latBins = np.linspace(-90,90,180/.125+1)
lonBins = np.linspace(-180,180,360/.125+1)

Now, I know that numpy.digitize and numpy.histogram can return me the indices of the bins that each NO2 value belongs to, but I want the actual binned matrix. This matrix looks as follows:

    binnedMatrix = np.zeros((1440,2880,15))

with each bin having a depth of 15. If I would now call binnedMatrix[0][0] (which holds all points with longitudes between -180.,-179.875 and latitudes between -90.,-89.875), I would like as a result all the NO2 values that were binned within these lats and lons. This would make it possible to just store this matrix somehwere, which is what I want.

Is there any function that returns this matrix? Or is there any way this can be done without a for loop?

Jesse
  • 23
  • 3
  • What is does A represent? That maximum values in 1 bin? That leaves you with a ragged array in the third dimension and is probably not a numpy problem. – Daniel F Feb 22 '17 at 09:16
  • Okay, I know my data well enough that at most 15 NO2 values can be in 1 bin, so A=15. Bins with less than those 15 will just have zeros in them. – Jesse Feb 22 '17 at 10:39

2 Answers2

1

I've came across a similar problem and your last comment seems to be relevant.

Assuming points in three-dimensional space with axes x, y and z, I want to put all values z in a bin respective to their x and y positions. This answer uses np.digitize and is valid for one-dimensional arrays, but can be adjusted to suit three-dimensions.

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 100, 3000).reshape(-1, 3)

In [3]: data
Out[3]: 
array([[59, 94, 85],
       [97, 47, 71],
       [27, 10, 23],
       ..., 
       [48, 61, 87],
       [72, 22, 86],
       [80, 47, 45]])

In [4]: bins = np.linspace(0, 100, 10)

In [5]: bins
Out[5]: 
array([   0.        ,   11.11111111,   22.22222222,   33.33333333,
         44.44444444,   55.55555556,   66.66666667,   77.77777778,
         88.88888889,  100.        ])

In [6]: digitized = np.digitize(data[:, 0:2], bins)

In [7]: digitized
Out[7]: 
array([[6, 9],
       [9, 5],
       [3, 1],
       ..., 
       [5, 6],
       [7, 2],
       [8, 5]])

In [8]: data[np.equal(digitized, [6, 9]).all(axis=1)]
Out[8]: 
array([[59, 94, 85],
       [56, 94, 80],
       [63, 97, 73],
       [64, 94, 13],
       [58, 92, 29],
       [60, 97, 53],
       [65, 92, 95],
       [64, 91, 40],
       [59, 92, 93],
       [58, 94, 77],
       [58, 89, 66],
       [60, 89, 19],
       [65, 95, 13],
       [65, 89, 39]])

In [9]: data[np.equal(digitized, [6, 9]).all(axis=1)][:, 2]
Out[9]: array([85, 80, 73, 13, 29, 53, 95, 40, 93, 77, 66, 19, 13, 39])

To solve your problem, use data[np.equal(digitized, [index_latitide, index_longitude]).all(axis=1)[:, 2]. This will retrieve all your NO2 values, although you can get more than 15 for each bin.

Community
  • 1
  • 1
Michael Gecht
  • 1,374
  • 1
  • 17
  • 26
0

I'm super confused on what it is you want exactly. However, this was my interpretation of what you wrote.

n, m = NO2.shape
df = pd.DataFrame(dict(
        NO2=NO2.ravel(),
        lat=lat.ravel(),
        lon=lon.ravel(),
        i=np.arange(n).repeat(m),
        j=np.tile(np.arange(m), n)
    ))

latBins = pd.cut(df.lat, np.linspace(-90, 90, 180 / .125 + 1))
lonBins = pd.cut(df.lon, np.linspace(-180, 180, 360 / .125 + 1))

g = df.groupby([latBins, lonBins])

Then I can grab a particular group

g.get_group(('(0.875, 1]', '(83.75, 83.875]'))

           NO2   i   j       lat        lon
6968  0.645213  69  68  0.956681  83.754923
8495  0.383437  84  95  0.964288  83.863002
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Okay, I get that I can do that. What I want however, is those NO2 values to be grouped in a matrix of say 1440x2880x15 (say at most 15 values will be in each bin), and that I get the following result: matrix[indexlat][indexlon] = [all 15 NO2 values in this bin] – Jesse Feb 22 '17 at 10:47