-1

Hi I would like to calculate euclidean distances between all points with X,Y coordinates in a dataframe and return the ID(the index) of the closest point.

currently I am using this to create a distance matrix:

diatancematrix=squareform(pdist(group))    
df=pd.DataFrame(dists)

followed by this to return the minimum point:

closest=df.idxmin()

I dont seem to be able to retain the correct ID/index in the first step as it seems to assign column and row numbers from 0 onwards instead of using the index. is there a way to keep the correct index here?

Gman
  • 134
  • 2
  • 9
  • duplicate? http://stackoverflow.com/questions/20303323/distance-calculation-between-rows-in-pandas-dataframe-using-a-distance-matrix – Back2Basics Jul 27 '15 at 00:00
  • I don't think its quite the same as I am not sure how to produce the matrix without loosing the individual ids for each point – Gman Jul 27 '15 at 01:20

1 Answers1

0

The distance matrix includes each point's distance to itself, which will always be zero. Thus, you should expect each row to just see itself as its own minimum.

dmargol1
  • 397
  • 1
  • 6
  • In the workflow I replace the 0 values with NaN values to avoid this issue – Gman Jul 27 '15 at 01:19
  • Only danger there is if another row shares the same coordinates with the original point. Creating a mask along the diagonal is probably the best. – dmargol1 Jul 27 '15 at 02:38
  • I see your point but all the points are unique so that shouldn't occur. – Gman Jul 27 '15 at 04:14