Dividing 3D data into cubic subsets and counting the points inside the cube

Question

I have a archive of data. (xx,yy,EXTRA) and I want to divide the data into grids of equal size. For example, lets suppose that the data is:

xx=np.array([0.1,  0.2,   3,   4.1,  3, 0.1])  
yy=np.array([0.35, 0.15, 1.5,  4.5, 3.5, 3])
EXTRA=np.array([0.01,0.003,2.002,4.004,0.5,0.2])

I want to make square grids of size 1x1, and after obtain the sum of "EXTRA" for every point on the grid.

This is what I tried

import math

for i in range(0,5):   
    for j in range(0,5):
        for x,y in zip(xx,yy):
           k=math.floor(x)
           kk=math.floor(y)
           if i<=k<i+1.0 and j<=kk<j+1.0:
               print("(x,y)=" ,x,",",y,",","(i,j)=",i,",",j ,"Unkow sum of EXTRA")

I obtain as output

(x,y)= 0.1 , 0.35 , (i,j)= 0 , 0 Unkow sum of extra
(x,y)= 0.2 , 0.15 , (i,j)= 0 , 0 Unkow sum of extra
(x,y)= 0.1 , 3.0 , (i,j)= 0 , 3 Unkow sum of extra
(x,y)= 3.0 , 1.5 , (i,j)= 3 , 1 Unkow sum of extra
(x,y)= 3.0 , 3.5 , (i,j)= 3 , 3 Unkow sum of extra
(x,y)= 4.1 , 4.5 , (i,j)= 4 , 4 Unkow sum of extra

So, the first two points have coordinates (0.1,0.35) and (0.2,0.15) and are inside the cuadrant (0,0). Looking in "EXTRA" I know that in the cuadrant (0,0) I should obtain that the sum of "EXTRA" should be Sum_extra= 0.01+0.003. However I can't figure out how to make that sum in terms of code.

More information

My real problem is that I have "particles" inside a big cubic box, and I want to subdivide the box in smaller boxes, and in each one of the smaller boxes I want to obtain the sum of their "mass", in my example "EXTRA=mass".

I suspect that the way I classify whether a particle belongs to a quadrant is slow, which would suppose a problem since I have a lot of data.Any suggestions will be appreciated.

`I suspect that ... is slow` - did you do any testing to validate this? — wwii, Oct 17 '20 at 17:42
Not yet, I'm working with smaller samples before I do the full work. However I think that I find a simpler way to do what I want, I will post the solution as a comment if it works. — Cruz, Oct 17 '20 at 17:47
You can also take a look at [my research](https://stackoverflow.com/questions/59239886/what-is-the-fastest-way-to-map-group-names-of-numpy-array-to-indices) of the fastest solution in 3D. — mathfux, Oct 17 '20 at 19:03
`pandas` appears to win here but you can achieve 2x - 3x speed-ups if you use dimensionality reduction. — mathfux, Oct 17 '20 at 19:06

wwii · Answer 1 · 2020-10-17T18:26:19.100

Combine the three arrays with zip and sort the result on the xx and yy values. Then group that by the xx and yy values. Get the sum of the EXTRA values for each group.

import operator, itertools
important = operator.itemgetter(0,1)
xtra = operator.itemgetter(-1)
data = sorted(zip(xx.astype(int),yy.astype(int),EXTRA),key=important)
gb = itertools.groupby(data,important)
for key,group in gb:
    values = list(map(xtra,group))
    print(key,values,sum(values))
    # or just
    #print(key,sum(map(xtra,group)))

Same concept using a Pandas DataFrame.

import pandas as pd
xx, yy = xx.astype(int),yy.astype(int)

In [25]: df = pd.DataFrame({'xx':xx,'yy':yy,'EXTRA':EXTRA})

In [26]: df.groupby(['xx','yy'])['EXTRA'].sum()
Out[26]: 
xx  yy
0   0     0.013
    3     0.200
3   1     2.002
    3     0.500
4   4     4.004
Name: EXTRA, dtype: float64

Dividing 3D data into cubic subsets and counting the points inside the cube

1 Answers1