3

I have a data frame having two columns latitude and longitude, and 863 rows so that each row has a point coordinate defined by latitude and longitude. Now I want to calculate the distance between all the rows in kilometers. I am using the following reference link to obtain the distance between latitude and longitude pair. If there were a few rows, I could have done using the reference link. But I have large rows and I think I need a loop to achieve a solution to the problem. Since I am new to python I couldn't able to create a logic to looping this idea.

Reference link: Getting distance between two points based on latitude/longitude

My data frame looks like this:

read_randomly_generated_lat_lon.head(3)
Lat          Lon
43.937845   -97.905537
44.310739   -97.588820
44.914698   -99.003517
ychaulagain
  • 73
  • 1
  • 8
  • it would be useful to give us code that creates a part of your dataframe, in order to be able to help with the specific problem you have. – Vasilis D Apr 01 '19 at 21:54
  • @VasilisD Thanks. I edited my question now. – ychaulagain Apr 01 '19 at 22:13
  • Thanks. Since you have 863 row, do you want to calculate all the pairs of distances, i.e. 863 * 862 / 2 values? If so, in which format do you want the output, in a matrix or...? – Vasilis D Apr 01 '19 at 22:18
  • That's correct. It would be great if I were able to store distance in a new column. – ychaulagain Apr 01 '19 at 22:25
  • This does not make sense. Each `Lat` `Lon` combination is a point. So you have to compare certain rows with other rows to calculate a distance. You cannot calculate the distance BETWEEN Lat and Lon. – Erfan Apr 01 '19 at 22:39

2 Answers2

7

You can do this using scikit-learn:

import numpy as np
from sklearn.neighbors import DistanceMetric

dfr = df.copy()
dfr.Lat = np.radians(df.Lat)
dfr.Lon = np.radians(df.Lon)
hs = DistanceMetric.get_metric("haversine")
(hs.pairwise(dfr)*6371) # Earth radius in km

Output:

array([[  0.        ,  48.56264446, 139.2836099 ],
       [ 48.56264446,   0.        , 130.57312786],
       [139.2836099 , 130.57312786,   0.        ]])

Note that the output is a square matrix, where element (i,j) is the distance between row i and row j

This seems to be faster than using scipy's pdist with a custom haversine function

arinarmo
  • 375
  • 1
  • 11
  • 1
    So, what is the distance between `43.937845 -97.905537` and `44.310739 -97.588820` in this case? `48.56264446306492 km?` – BhishanPoudel Apr 02 '19 at 04:01
  • Exactly. The original question wants the distance "in a new column", but that doesn't make sense if OP wants the pairwise distance – arinarmo Apr 02 '19 at 16:24
4

Please note: The following script does not account for the curvature of the earth. There are numerous documents Convert lat/long to XY explaining this problem.

However, the distance between coordinates can be roughly determined. The export is a Series, which can be easily concatenated with your original df to provide a separate column displaying distance relative to your coordinates.

d = ({
    'Lat' : [43.937845,44.310739,44.914698],       
    'Long' : [-97.905537,-97.588820,-99.003517],                               
     })

df = pd.DataFrame(d)

df = df[['Lat','Long']]

point1 = df.iloc[0]

def to_xy(point):

    r = 6371000 #radians of the earth (m)
    lam,phi = point
    cos_phi_0 = np.cos(np.radians(phi))

    return (r * np.radians(lam) * cos_phi_0, 
            r * np.radians(phi))

point1_xy = to_xy(point1)

df['to_xy'] = df.apply(lambda x: 
         tuple(x.values),
         axis=1).map(to_xy)

df['Y'], df['X'] = df.to_xy.str[0], df.to_xy.str[1]

df = df[['X','Y']] 
df = df.diff()

dist = np.sqrt(df['X']**2 + df['Y']**2)

#Convert to km
dist = dist/1000

print(dist)

0           NaN
1     41.149537
2    204.640462
jonboy
  • 415
  • 4
  • 14
  • 45