2

I have a pandas dataframe my_df with the following columns :

id  lat1 lon1 lat2 lon2
1   45   0    41   3
2   40   1    42   4
3   42   2    37   1

Basically, I'd like to do the following :

import haversine

haversine.haversine((45, 0), (41, 3)) # just to show syntax of haversine()
> 507.20410687342115

# what I'd like to do
my_df["dist"] = haversine.haversine((my_df["lat1"], my_df["lon1"]),(my_df["lat2"], my_df["lon2"]))

TypeError: cannot convert the series to < class 'float' >

Using this, I tried the following :

my_df['dist'] = haversine.haversine(
        list(zip(*[my_df[['lat1','lon1']][c].values.tolist() for c in my_df[['lat1','lon1']]]))
        , 
        list(zip(*[my_df[['lat2','lon2']][c].values.tolist() for c in my_df[['lat2','lon2']]]))
        )

File "blabla\lib\site-packages\haversine__init__.py", line 20, in haversine lat1, lng1 = point1

ValueError: too many values to unpack (expected 2)

Any idea of what I'm doing wrong / how I can achieve what I want ?

François M.
  • 4,027
  • 11
  • 30
  • 81
  • possible dupe: https://stackoverflow.com/questions/25767596/vectorised-haversine-formula-with-a-pandas-dataframe – EdChum Jul 12 '17 at 10:20

2 Answers2

3

Use apply with axis=1:

my_df["dist"] = my_df.apply(lambda row : haversine.haversine((row["lat1"], row["lon1"]),(row["lat2"], row["lon2"])), axis=1)

To call the haversine function on each row, the function understands scalar values, not array like values hence the error. By calling apply with axis=1, you iterate row-wise so we can then access each column value and pass these in the form that the method expects.

Also I don't know what the difference is but there is a vectorised version of the haversine formula

EdChum
  • 376,765
  • 198
  • 813
  • 562
2

What about using a vectorized approach:

import pandas as pd

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = pd.np.radians([lat1, lon1, lat2, lon2])

    a = pd.np.sin((lat2-lat1)/2.0)**2 + \
        pd.np.cos(lat1) * pd.np.cos(lat2) * pd.np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * pd.np.arcsin(np.sqrt(a))

Demo:

In [38]: df
Out[38]:
   id  lat1  lon1  lat2  lon2
0   1    45     0    41     3
1   2    40     1    42     4
2   3    42     2    37     1

In [39]: df['dist'] = haversine(df.lat1, df.lon1, df.lat2, df.lon2)

In [40]: df
Out[40]:
   id  lat1  lon1  lat2  lon2        dist
0   1    45     0    41     3  507.204107
1   2    40     1    42     4  335.876312
2   3    42     2    37     1  562.543582
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • `AttributeError: 'numpy.float64' object has no attribute 'radians'` :( – François M. Jul 13 '17 at 08:59
  • @fmalaussena, make sure that you haven't overwritten `np` - alias for `numpy` with some `float64` varaible name. If you don't use "classical" numpy alias `np` then you can either use `numpy.radians` or `pd.np.radians` – MaxU - stand with Ukraine Jul 13 '17 at 09:01