4

I would like to calculate the distance along a path of GPS coordinates which are stored in two columns in a data frame.

import pandas as pd

df = pd.DataFrame({ 'lat' : [1, 2.5, 3, 1.2],
                    'lng' : [1, 1, 2.1, 1],
                    'label': ['foo', 'bar', 'zip', 'foo']})
print df

Output

  label  lat  lng
0   foo  1.0  1.0
1   bar  2.5  1.0
2   zip  3.0  2.1
3   foo  1.2  1.0

The GPS coordinates are stored in radians. So, the distance between the first and second rows of the dataframe can be calculated as follows:

import math as m

r1 = 0
r2 = 1

distance =m.acos(m.sin(df.lat[r1]) * m.sin(df.lat[r2]) + 
     m.cos(df.lat[r1]) * m.cos(df.lat[r2]) * m.cos(df.lng[r2]-df.lng[r1]))*6371

I would like to repeat this calculation between every pair of consecutive rows and then add each short distance into the longer final distance for the full path.

I could put this into a loop for n-1 rows of the dataframe, but is there a more pythonic way to do this?

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
edge-case
  • 1,128
  • 2
  • 14
  • 32

1 Answers1

10

Vectorized Haversine function:

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

Solution:

df['dist'] = haversine(df['lat'], df['lng'],
                       df['lat'].shift(), df['lng'].shift(),
                       to_radians=False)

Result:

In [65]: df
Out[65]:
  label  lat  lng          dist
0   foo  1.0  1.0           NaN
1   bar  2.5  1.0   9556.500000
2   zip  3.0  2.1   7074.983158
3   foo  1.2  1.0  10206.286067
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks, the haversine function works when I input numbers for lat1, lon1, etc, but when I try it on my actual dataframe it returns the error: `AttributeError: 'float' object has no attribute 'sin' `. I'm able to print `np.sin(dlat/2.0)**2` from the loop with no problem. Any ideas on how I can troubleshoot this? – edge-case Apr 17 '17 at 15:57
  • @omomo, make sure that you don't have variables named `np` and that you don't have typos... Also make sure that you pass at least four arguments to that function - `lat1, lon1, lat2, lon2` – MaxU - stand with Ukraine Apr 17 '17 at 16:00
  • Is there a way to share a dataframe on stack overflow, perhaps as a .pkl file? When I create the dataframe from scratch with numbers, it works. But the actual dataframe I'm using built from importing data from an excel file. Both the excel dataframe and the built-from-scratch dataframe look identical, dtypes are the same, I can't figure out what's different. – edge-case Apr 17 '17 at 16:51
  • @omomo, you can upload your `pkl` file to any freeware file-exchange web-site and post here a link to it – MaxU - stand with Ukraine Apr 17 '17 at 16:55
  • 1
    thanks for your help. The issue was that when the dataframe was loaded from the Excel file, some values were NaN. Even though those values were dropped, the column type remained "object" instead of float64. I had to force the columns in the dataframe to be float64 with `df = df.apply(pd.to_numeric)` – edge-case Apr 20 '17 at 13:31