0

I have a pandas dataframe with columns "X" and "Y". I want to obtain for each row the single maximum distance from all other rows. I know I can do this with nested loops such as:

for i_df,row in df.iterrows():
    max_dist=0
    for i_others,other_row in df.iterrows():
        xdiff = row.X - other_row.X
        ydiff = row.Y - other_row.Y
        dist = np.sqrt(xdiff**2 + ydiff**2)
        if dist>max_dist:
            max_dist=dist
    df.loc['max_dist'][i_df]=max_dist 

Is there a computationally faster or more pythonic way to do this?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
statHacker
  • 113
  • 5
  • Various techniques for calculating all the pairwise distances in a numpy array: https://stackoverflow.com/questions/22720864/efficiently-calculating-a-euclidean-distance-matrix-using-numpy. So one option is `df.to_numpy()`, then an appropriate solution from there. – slothrop May 30 '23 at 16:16

1 Answers1

1

You can use cdist from to get arrays of all the distances, then compute their maxes :

#pip install scipy
from scipy.spatial.distance import cdist

df["max_dist"] = cdist(df[["X", "Y"]], df[["X", "Y"]]).max(axis=1)
Timeless
  • 22,580
  • 4
  • 12
  • 30