I have a set of 500,000 spheres 3D coordinates along with their radii (non-unique) which are put in a pandas DataFrame object.
I would like to create an efficient search routine where I sample many points' coordinates, and the routine returns if each point is in a sphere, and if yes, in which sphere it is.
To do that, I read that using a cartesian search mesh can be the solution. I therefore created a mesh with a given cell size, and assigned ids to each sphere. In case it is useful, I also stored the differences between the center of the cell and the center of the sphere.
import pandas as pd
import numpy as np
size_cell = 20
fake_coordinates = np.uniform(-200,200, (500000, 3)) # Here spheres will cross, which should not be the case in the real input. If point in two spheres, choose the first one that comes.
data = pd.DataFrame(fake_coordinates, columns=['x','y','z'])
data['r'] = np.uniform(1,3, 500000)
x_vect = np.arange(data.x.min()-np.max(data.r), data.x.max()+np.max(data.r), size_cell)
y_vect = np.arange(data.y.min()-np.max(data.r), data.y.max()+np.max(data.r), size_cell)
z_vect = np.arange(data.z.min()-np.max(data.r), data.z.max()+np.max(data.r), size_cell)
data['i_x'] = ((data.x-x_vect[0])//size_cell).astype(int)
data['i_y'] = ((data.y-y_vect[0])//size_cell).astype(int)
data['i_z'] = ((data.z-z_vect[0])//size_cell).astype(int)
data['dx'] = data.x-(x_vect[data.i_x]+x_vect[data.i_x+1])/2
data['dy'] = data.y-(y_vect[data.i_y]+y_vect[data.i_y+1])/2
data['dz'] = data.z-(z_vect[data.i_z]+z_vect[data.i_z+1])/2
Then, I noticed that a sphere which has its center that belongs to a neighboring cell can cross its cell and therefore, the sampled point can be in this neighbor sphere. So I computed the crossing nature of each sphere.
data['crossing_x_left'] = data.dx < - (size_cell - data.r)
data['crossing_x_right'] = data.dx > (size_cell - data.r)
data['crossing_y_left'] = data.dy < - (size_cell - data.r)
data['crossing_y_right'] = data.dy > (size_cell - data.r)
data['crossing_z_down'] = data.dz < - (size_cell - data.r)
data['crossing_z_top'] = data.dz > (size_cell - data.r)
Now I should sample N points:
sample = np.uniform(-200,200, (1000000, 3))
If I take into account only one point and only candidate spheres with the center within the cell:
i_x = (sample[0,0]-x_vect[0])//size_cell
i_y = (sample[0,1]-y_vect[0])//size_cell
i_z = (sample[0,2]-z_vect[0])//size_cell
ok_x = data['i_x']==i_x
ok_y = data['i_y']==i_y
ok_z = data['i_z']==i_z
candidates = data.loc[ok_x & ok_y & ok_z]
Now I want to test many points at the same time, but also take into account the crossing spheres. But then I run into a method issue. How to efficiently compute (probably meaning processing all points and all spheres at the same time using matrices) if the points belong to a sphere, and if yes, in which sphere they are? The part where spheres can cross the boundary makes it difficult for me to see.