2

Here's using how I use haversine library to calculate distance between two points

import haversine as hs
hs.haversine((106.11333888888888,-1.94091666666667),(96.698661, 5.204783))

Here's how to calculate haversine distance using sklearn

from sklearn.metrics.pairwise import haversine_distances
import numpy as np
radian_1 = np.radians(df1[['lat','lon']])
radian_2 = np.radians(df2[['lat','lon']])
D = pd.DataFrame(haversine_distances(radian_1,radian_2)*6371,index=df1.index, columns=df2.index)

What i need is doing similar things but instead using sklearn.metrics.pairwise library, I use haversine library

Here's my dataset df1

   index       lon        lat
0   0   107.071969  -6.347778
1   1   110.431361  -7.773489
2   2   111.978469  -8.065442

and dataset df2

    index      lon        lat
5   5   112.340919  -7.520442
6   6   107.179119  -6.291131
7   7   106.807442  -6.437383

Here's expected output

        5           6           7
    0  596.019968   13.413123   30.882602
    1  212.317223  394.942014  426.564799
    2   72.573637  565.020998  598.409848
Nabih Bawazir
  • 6,381
  • 7
  • 37
  • 70
  • can you check is [this Q/A](https://stackoverflow.com/questions/25767596/vectorised-haversine-formula-with-a-pandas-dataframe) is what you need? – mozway Apr 04 '22 at 08:36
  • Not, what I need is cross-tab distance, I will give my expoected output – Nabih Bawazir Apr 04 '22 at 08:40

2 Answers2

1

Following the documentation and example found on: sklearn.metrics.haversine

result = haversine_distances(np.radians(df_1[["lat","lon"]]), np.radians(df_2[["lat", "lon"]])) * 6371000/1000
result_df = pd.DataFrame(result, index = df_1["index"], columns=df_2["index"])

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>5</th>
      <th>6</th>
      <th>7</th> </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th> </tr> </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>596.019968</td>
      <td>13.413123</td>
      <td>30.882602</td> </tr>
    <tr>
      <th>1</th>
      <td>212.317223</td>
      <td>394.942014</td>
      <td>426.564799</td> </tr>
    <tr>
      <th>2</th>
      <td>72.573637</td>
      <td>565.020998</td>
      <td>598.409848</td> </tr> </tbody> </table>

You first need to convert the latitude and longitude to radians, and once you get back the distance you need to multiply by the earth radius to get the correct distance.

1

You can use itertools.product for creating all cases then use haversine for getting results like the below:

import haversine as hs
import pandas as pd
import numpy as np
import itertools

res = []
for a,b in (itertools.product(*[df1.values , df2.values])):
    res.append(hs.haversine(a,b))

m = int(np.sqrt(len(res)))
df = pd.DataFrame(np.asarray(res).reshape(m,m))
print(df)

Output:

            0           1           2
0  587.500555   12.058061   29.557005
1  212.580742  365.487782  405.718803
2   46.333180  537.684789  578.072579
I'mahdi
  • 23,382
  • 5
  • 22
  • 30