0

I have an array:

test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
                      [2.6, 6.4, 5.2, 6.2],
                      [7.2, 6.2, 2.5, 1.7],
                      [8.2, 7.6, 4.2, 7.3] ]

Is it possible to obtain a pandas dataframe of the form:

row_id  | row1  | row2          | row3          | row4
row1      0.0     d(row1,row2)    d(row1,row3)    d(row1,row4)
row2      ...     0.0             ...             ...
row3      ...        ...          0.0             ...
row4      ...        ...          0.0             ...

where d(row1, row2) is the Euclidean distance between row1 and row2.

What I am trying now is first generating a list of all pairs of rows, then computing the distance and assigning each element to the dataframe. Is there a better/faster way of doing this?

Qubix
  • 4,161
  • 7
  • 36
  • 73
  • Does this answer your question? [Computing Euclidean distance for numpy in python](https://stackoverflow.com/questions/28687321/computing-euclidean-distance-for-numpy-in-python) – Ben.T May 26 '20 at 11:50

3 Answers3

2
from scipy import spatial
import numpy as np

test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
                      [2.6, 6.4, 5.2, 6.2],
                      [7.2, 6.2, 2.5, 1.7],
                      [8.2, 7.6, 4.2, 7.3] ])

dist = spatial.distance.pdist(test_arr)
spatial.distance.squareform(dist)

Result:

array([[0.        , 5.63471383, 7.79037868, 9.52365476],
       [5.63471383, 0.        , 6.98140387, 5.91692488],
       [7.79037868, 6.98140387, 0.        , 6.1       ],
       [9.52365476, 5.91692488, 6.1       , 0.        ]])
Stef
  • 28,728
  • 2
  • 24
  • 52
2
from sklearn.metrics.pairwise import euclidean_distances
pd.DataFrame(euclidean_distances(test_arr, test_arr))

          0         1         2         3
0  0.000000  5.634714  7.790379  9.523655
1  5.634714  0.000000  6.981404  5.916925
2  7.790379  6.981404  0.000000  6.100000
3  9.523655  5.916925  6.100000  0.000000
Bertil Johannes Ipsen
  • 1,656
  • 1
  • 14
  • 27
1

Using cdist to compute pairwise distances

Place 2D resulting array into Pandas DataFrame

import numpy as np
from scipy.spatial.distance import cdist
import pandas as pd

test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
                      [2.6, 6.4, 5.2, 6.2],
                      [7.2, 6.2, 2.5, 1.7],
                      [8.2, 7.6, 4.2, 7.3] ])

    # Use cdist to compute pairwise distances
    dist = cdist(test_arr, test_arr)

    # Place into Pandas DataFrame
    # index and names of columns
    names = ['row' + str(i) for i in range(1, dist.shape[0]+1)]
    df = pd.DataFrame(dist, columns = names, index = names)

    print(df)

Output

Pandas DataFrame

        row1      row2      row3      row4
row1  0.000000  5.634714  7.790379  9.523655
row2  5.634714  0.000000  6.981404  5.916925
row3  7.790379  6.981404  0.000000  6.100000
row4  9.523655  5.916925  6.100000  0.000000
DarrylG
  • 16,732
  • 2
  • 17
  • 23