3

I have a Numpy array a of shape 5000 * 3 that contains the coordinates of 5000 points in 3D space. I also have another array b of shape 512 * 3 that again contains the coordinates of another 512 points in that 3D space. I wish to select 512 points from a that are closest to points in b. Meaning that to find a Numpy array c of shape 512 * 3 from the points in a such that for each 1<=i<=512, c[i] is the closest point to b[i], from all of the 5000 points in a.

To find c, there is one way to compare point by point and find the closest but it is very slow. I wish to find another faster way to approach this.

K.N
  • 871
  • 2
  • 10
  • 30
  • There is no faster way. You have to look at each point. If your data has "clusters" of points, you might be able to find the closest cluster and then search the cluster, but that's data-dependent. – Tim Roberts Aug 27 '21 at 04:59
  • 4
    [scipy.spatial.KDTree](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html#scipy.spatial.KDTree) may be helpful. – bb1 Aug 27 '21 at 05:02
  • 1
    @TimRoberts Why is there no faster way? Is it because of 3D? For example in 1D, you could sort both arrays and sweep them in parallel, ending up with O(n log n) instead of O(n^2) (if the arrays had equal length). Why is nothing like that possible in 3D? – no comment Aug 27 '21 at 05:04
  • 2
    @TimRoberts - There are much better algorithms than brute force, eg voronoi cell overlays, kd-trees, octrees or [voxel approximation](https://link.springer.com/content/pdf/10.1007/s00138-017-0889-4.pdf). – Michael Szczesny Aug 27 '21 at 05:18
  • @don'ttalkjustcode Because there's no good way to "sort" 3D points in a useful way. – Tim Roberts Aug 27 '21 at 06:02
  • @TimRoberts 1D and simple sorting was just an example.. For higher dimensions, what about all those things that Michael mentioned? – no comment Aug 27 '21 at 06:05
  • @MadPhysicist Id's say since this is Euclidean distance instead of spherical distance (which the dupe target is specifically about) and asking `numpy`-only (`kdtree` is implemented in `scipy`, and most answers in the dupe are using `sklearn` as well), that [this](https://stackoverflow.com/questions/52366421/how-to-do-n-d-distance-and-nearest-neighbor-calculations-on-numpy-arrays) is a better dupe target. – Daniel F Aug 27 '21 at 08:11
  • @don'ttalkjustcode The problem is all of those methods are still extremely slow when implemented in `python`, even in `numpy` due to the whole "interpreted vs compiled language" issue. You'd need a c-complied function, but that doesn't exist in `numpy` - and the question seems to be asking for `numpy`-only. In `numpy`, you're stuck with brute force. c-compiled functions for more efficient methods are available in `scipy.spatial` - but many 3rd-party APIs only include `numpy`, so specifying is important. – Daniel F Aug 27 '21 at 08:18
  • `numpy` is C-compiled. Well, technically a big part of it is Fortran-compiled, but it is all native code. – Tim Roberts Aug 27 '21 at 17:52
  • @don'ttalkjustcode Those algorithms essentially do clustering, which will certainly make searching easier, but it's still not sorting. There's just no accepted algorithm for looking at two points in 3D and saying "which one is less than the other"? – Tim Roberts Aug 27 '21 at 17:56
  • @TimRoberts Not sure why you're stuck with sorting. I already said that that was just an example. A sorted array in 1D, some other organized data structure in general. (Although those data structures/algorithms might very well involve some sorting to do their job.) – no comment Aug 27 '21 at 17:59

0 Answers0