How to retain Index information when calculating euclidean distances in a dataframe?

Question

Hi I would like to calculate euclidean distances between all points with X,Y coordinates in a dataframe and return the ID(the index) of the closest point.

currently I am using this to create a distance matrix:

diatancematrix=squareform(pdist(group))    
df=pd.DataFrame(dists)

followed by this to return the minimum point:

closest=df.idxmin()

I dont seem to be able to retain the correct ID/index in the first step as it seems to assign column and row numbers from 0 onwards instead of using the index. is there a way to keep the correct index here?

duplicate? http://stackoverflow.com/questions/20303323/distance-calculation-between-rows-in-pandas-dataframe-using-a-distance-matrix — Back2Basics, Jul 27 '15 at 00:00
I don't think its quite the same as I am not sure how to produce the matrix without loosing the individual ids for each point — Gman, Jul 27 '15 at 01:20

score 0 · Answer 1 · answered Jul 27 '15 at 01:14

0

The distance matrix includes each point's distance to itself, which will always be zero. Thus, you should expect each row to just see itself as its own minimum.

answered Jul 27 '15 at 01:14

dmargol1

397
1
6

In the workflow I replace the 0 values with NaN values to avoid this issue – Gman Jul 27 '15 at 01:19
Only danger there is if another row shares the same coordinates with the original point. Creating a mask along the diagonal is probably the best. – dmargol1 Jul 27 '15 at 02:38
I see your point but all the points are unique so that shouldn't occur. – Gman Jul 27 '15 at 04:14

How to retain Index information when calculating euclidean distances in a dataframe?

1 Answers1