Fastest method to get max distance of a row from all other rows in pandas dataframe

Question

I have a pandas dataframe with columns "X" and "Y". I want to obtain for each row the single maximum distance from all other rows. I know I can do this with nested loops such as:

for i_df,row in df.iterrows():
    max_dist=0
    for i_others,other_row in df.iterrows():
        xdiff = row.X - other_row.X
        ydiff = row.Y - other_row.Y
        dist = np.sqrt(xdiff**2 + ydiff**2)
        if dist>max_dist:
            max_dist=dist
    df.loc['max_dist'][i_df]=max_dist

Is there a computationally faster or more pythonic way to do this?

Various techniques for calculating all the pairwise distances in a numpy array: https://stackoverflow.com/questions/22720864/efficiently-calculating-a-euclidean-distance-matrix-using-numpy. So one option is `df.to_numpy()`, then an appropriate solution from there. — slothrop, May 30 '23 at 16:16

score 1 · Answer 1 · answered May 30 '23 at 17:21

1

You can use cdist from scipy to get arrays of all the distances, then compute their maxes :

#pip install scipy
from scipy.spatial.distance import cdist

df["max_dist"] = cdist(df[["X", "Y"]], df[["X", "Y"]]).max(axis=1)

answered May 30 '23 at 17:21

Timeless

22,580
4
12
30

Fastest method to get max distance of a row from all other rows in pandas dataframe

1 Answers1