7

I have a numpy.ndarray of 3d-points, i.e. the np.shape of it is (4350,3) and such a second numpy.ndarray of 3d-points of np.shape (10510,3). Now I am trying to find the right python-package to calculate the nearest neighbors in the second array of the points in the first array as quickly as possible.

I've found a quite similar question here: find the k nearest neighbours of a point in 3d space with python numpy but I don't understand how to use the solution there for my problem.

I'd very, very much appreciate your help on this!

Studentu
  • 1,375
  • 2
  • 10
  • 11
  • 1
    Is this a one time operation or are you going to be finding multiple nearest neighbors in the same set? I ask that because if it is a single operation and you are literally only looking for 1 point on new sets each time then a simple 1 time loop looking for the smallest squared distance will suffice and be the fastest. – Victor 'Chris' Cabral Jan 09 '19 at 16:47
  • @Victor'Chris'Cabral No, what I've implemented so far is finding the nearest neighbor through calculating the euclidean distance for each point of the first set for each point of the second one (4350*10510 = 45718500 times) and returning the points for the closest distances. But all this I do in a while-loop that runs ~20times and for several "first" pointsets, so this naive classical approach takes several hours. – Studentu Jan 09 '19 at 16:51
  • 1
    I definitely misspoke. I see your problem now. – Victor 'Chris' Cabral Jan 09 '19 at 16:54
  • 3
    I would suggest making a kdtree out of the smaller set and then doing the loop. https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html – Victor 'Chris' Cabral Jan 09 '19 at 16:57
  • @Victor'Chris'Cabral Thanks for the link and the suggestion! I had read through that documention before, but I don't know how to apply it to my problem (sorry if this should be clear from the documentation, I'm not very good at programming). Could you give me an example of how this could look like for my problem (I mean, for two 3D-pointsets in general)? – Studentu Jan 09 '19 at 17:09
  • I've recently been working on a kdtree problem myself. Here's a [great](https://www.youtube.com/watch?v=TLxWtXEbtFE) [primer](https://www.youtube.com/watch?v=E1_WCdUAtyE). – Ic3fr0g Jan 09 '19 at 17:57
  • @lc3fr0g Thanks for the links! My question is answered, but I'll definitely watch the videos. – Studentu Jan 10 '19 at 10:08

1 Answers1

8

Here is the KDTree way :

from scipy.spatial import KDTree

data= np.random.rand(10510,3)
sample= np.random.rand(4350,3)
kdtree=KDTree(data)

Then dist,points=kdtree.query(sample,2) will give you the 2 best neighbors for the 4350 candidates in about one second.

B. M.
  • 18,243
  • 2
  • 35
  • 54
  • Thank you so much, I will try it this way and then let you know if it worked! – Studentu Jan 09 '19 at 22:04
  • I'm sorry, I don't get what the returns of kdtree.query(sample,2) do mean. I'd have expected that in dist there are the Euclidean (since parameter 2 stands for the Euclidean distance) distances to the nearest neighbors and points are the points from the pointset "data", which are nearest to "sample". But dist looks like this: print(dist) [[0.02731417 0.03267154] [0.02175954 0.04624616] ... [0.03183459 0.03818426] [0.01794547 0.03079906]] and points like this: print(points) [[ 262 5545] [3667 5619] ... [ 696 9467] [9617 1987]] so I'm clearly wrong. – Studentu Jan 09 '19 at 23:44
  • Oh, I think kdtree.query(sample,2) is interpreted as kdtree.query(sample, k=2) instead of kdtree.query(sample, p=2), but what I need is the latter, right? – Studentu Jan 09 '19 at 23:57
  • no `p=2` is the default. `k` is for k-nearest points. if you want just the nearest kdtree.query(sample,1) or kdtree.query(sample) is suffisant since k=1 is the default. – B. M. Jan 10 '19 at 07:28
  • Yes, that's what I'd meant. Thanks for your answer, very nice of you to help! – Studentu Jan 10 '19 at 10:06