4

I have a Nx3 matrix that contains the x,y,z coordinates of N points in 3D space. I'd like to find the absolute distances between all points without duplicates.

I tried using scipy.spatial.distance.cdist() [see documentation here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html ]. However, the output matrix contains duplicats of distances. For example, the distance between the points P1 and P2 is calculated twice as distance from P1 to P2 and again as distance from P2 to P1. See code output:

>>> from scipy.spatial import distance
>>> points = [[1, 2, 3],
...           [4, 5, 6],
...           [7, 8, 9]]
>>> distances = distance.cdist(points, points, 'euclidean')
>>> print(distances)
[[ 0.          5.19615242 10.39230485]
 [ 5.19615242  0.          5.19615242]
 [10.39230485  5.19615242  0.        ]]

I'd like the output to be without dupilcates. For example, find the distance between the first point and all other points then the second point and the remaining points (exluding the first point) and so on. Ideally, in an efficient and scalable manner that preserves the order of the points. That is once I find the distances, I'd like to query them; e.g. finding distances within a certain range and be able to output points that correspond to these distances.

Othman
  • 43
  • 3
  • 1
    Look at its cousin `pdist` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html? – Divakar Jun 13 '19 at 13:18
  • Also, if you want to do the query of point within a tolerance all in one step (without explicitly finding all the distances) look into [`KDTree`](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html). For a general discussion, see [Here](https://stackoverflow.com/questions/52366421/how-to-do-n-d-distance-and-nearest-neighbor-calculations-on-numpy-arrays/52366706#52366706) – Daniel F Jun 13 '19 at 13:21
  • Thank you both! KDTree works perfectly for my purposes. – Othman Jun 13 '19 at 14:13

1 Answers1

1

Looks like in general you want a KDTree implementation, with query_pairs.

from scipy.spatial import KDTree
points_tree = KDTree(points)
points_in_radius = points_tree.query_pairs(radius)

This will be much faster than actually computing all of the instances and applying a tolerance.

Daniel F
  • 13,620
  • 2
  • 29
  • 55