Sort an array of multi D points by distance to a reference point

Question

I have a reference point p_ref stored in a numpy array with a shape of (1024,), something like:

print(p_ref)
>>> array([ p1,  p2,  p3, ..., p_n])

I also have a numpy array A_points with a shape of (1024,5000) containing 5000 points, each having 1024 dimensions like p_ref. My problem: I would like to sort the points in A_points by their (eucledian) distance to p_ref!

How can I do this? I read about scipy.spatial.distance.cdist and scipy.spatial.KDTree, but they both weren't doing exactly what I wanted and when I tried to combine them I made a mess. Thanks!

For reference and consistency let's assume:

p_ref = np.array([0,1,2,3])
A_points = np.reshape(np.array([10,3,2,13,4,5,16,3,8,19,4,11]), (4,3))

Expected output:

array([[ 3,  2, 10],
       [ 4,  5, 13],
       [ 3,  8, 16],
       [ 4, 11, 19]])

Can you also add the expected output for the toy example? I would assume the output would just be a different row order of `A_points`. — Divakar, Apr 22 '20 at 22:04

Christian Sloper · Accepted Answer · 2020-04-23T07:34:47.450

2

EDIT: Updated on suggestions by the OP.

I hope I understand you correctly, but you can calculate the distance between two vectors by using numpy.linalg.norm. Using this it should be as simple as:

A_sorted = sorted( A_points.T, key = lambda x: np.linalg.norm(x - p_ref ) )
A_sorted = np.reshape(A_sorted, (3,4)).T

edited Apr 23 '20 at 07:34

answered Apr 22 '20 at 18:59

Christian Sloper

7,440
3
15
28

Thanks, I like the simplicity of the answer, although some other method might be faster! For it to work though I have to transpose `A_points` and eventually reshape the answer, looking like this: `A_sorted = sorted( A_points.T, key = lambda x: np.linalg.norm(x - p_ref ) )`, `A_sorted = np.reshape(A_sorted, (3,4)).T`. I accept your post as an answer, but If you want edit it with as suggested, I also provided toy values in my initial question – NeStack Apr 22 '20 at 21:49

Divakar · Answer 2 · 2020-04-23T18:45:48.997

1

You can do something like this -

A_points[:,np.linalg.norm(A_points-p_ref[:,None],axis=0).argsort()]

Another with np.einsum that should be more efficient than np.linalg.norm -

d = A_points-p_ref[:,None]
out = A_points[:,np.einsum('ij,ij->j',d,d).argsort()]

Further optimize to leverage fast matrix-multiplication to replace last step -

A_points[:,((A_points**2).sum(0)+(p_ref**2).sum()-2*p_ref.dot(A_points)).argsort()]

edited Apr 23 '20 at 18:45

answered Apr 22 '20 at 19:05

Divakar

218,885
19
262
358

Thanks! The first suggestion is good, but for it to work I have to transpose `A_points` and get rid of `[:,None]`, eventually having `A_out = A_points.T[np.linalg.norm(A_points.T-p_ref,axis=1).argsort()]`. If you care, you can edit your answer. You can also test with these toy values, they agree with the dimension I have in my post: `p_ref = np.array([0,1,2,3])`, `A_points = np.reshape(np.array([10,3,2,13,4,5,16,3,8,19,4,11]), (4,3))`. The second suggestion has also an issue with the dimensions, but I made it work for `dim(A)=(4,4)`, if you want you can change it, too. – NeStack Apr 22 '20 at 21:39
@NeStack Are you sure? Both seem to work just fine for me. What made you conclude that these are not working? – Divakar Apr 22 '20 at 21:44
Thanks for the answers again! I just tested your snippets the way you present them on the toy values and they aren't making the desired calculation, although they are running without an error (take note that dim(p_ref)=n, dim(A_points)=(n,m), I also included the toy values in the question to make it clearer). Still, I only had to rotate variable to make them work, so +1. – NeStack Apr 22 '20 at 21:59
@NeStack So, I think you meant to sort each row of `A_points`. That was the confusion. Code edited. – Divakar Apr 23 '20 at 18:46

Sort an array of multi D points by distance to a reference point

2 Answers2