1

I have the following pandas dataframe

import pandas as pd    
df = pd.DataFrame(zip(["A","B", "C", "D"],[10,30,55,60]), columns=["Name", "Distance"])

Out:

  Name  Distance
0    A        10
1    B        30
2    C        55
3    D        60

Now, I would like to make a distance matrix, i.e.

enter image description here

Does anyone know how to make this efficiently with python?

henry
  • 875
  • 1
  • 18
  • 48
  • Look into https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html – Learning is a mess Jan 13 '22 at 10:29
  • Does this help ? https://stackoverflow.com/questions/64008893/create-a-distance-matrix-from-pandas-dataframe-using-a-bespoke-distance-function – sagi Jan 13 '22 at 10:34

1 Answers1

0

You can use scipy.spatial.distance.cdist:

from scipy.spatial.distance import cdist
pd.DataFrame(cdist(df[['Distance']], df[['Distance']]),
             index=df['Name'], columns=df['Name'])

or sklearn.metrics.pairwise_distances:

from sklearn.metrics import pairwise_distances
pd.DataFrame(pairwise_distances(df[['Distance']]),
             index=df['Name'], columns=df['Name'])

or, simply, raw numpy with broadcasting:

a = df['Distance'].values
pd.DataFrame(abs(a-a[:,None]), index=df['Name'], columns=df['Name'])

NB. the scipy and sklearn approaches enable to use a wide range of distance functions, not only the euclidean distance.

output:

Name   A   B   C   D
Name                
A      0  20  45  50
B     20   0  25  30
C     45  25   0   5
D     50  30   5   0
mozway
  • 194,879
  • 13
  • 39
  • 75