0

I have a dataframe as follows:

      payeeId    latHome   longHome     Total_Amnt
 0  193fde722     0.000000   0.000000        15.0
 1  4d8ecb2b5c   28.425515  77.097547        10.0
 2  2c3ea738     28.542923  77.253164        20.0
 3  2961f3e8     28.542898  77.253162        10.0
 4  5cda3d3763   28.461630  77.031944    129000.0
 5  3cb02ccbfc   26.180680  91.740042       220.0
 6  79918aae03    0.000000   0.000000      1760.0

I am trying to compute the distance between two consecutive latHome and longHome. To do so,I have followed this SO post . Below is the function I am using and then applying:

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
   if to_radians:
      lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
   a = np.sin((lat2-lat1)/2.0)**2 + \
       np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
   return earth_radius * 2 * np.arcsin(np.sqrt(a))

df_c['Dist_p'] = df_c.apply(haversine(lat1=df_c['latHome'].astype(float).shift(), \
                            lon1 = df_c['longHome'].astype(float).shift(), \
                            lat2 = df_c['latHome'].astype(float), \
                            lon2=df_c['longHome'].astype(float)))

But I am getting the following error:

ValueError: no results

Also I am getting the below error as well when I am directly using this function i.e. without apply.

File "<ipython-input-1-3a6c757b499d>", line 13, in haversine
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

 TypeError: loop of ufunc does not support argument 0 of type Series which has no callable 
 radians method

Any clue will be appreciated.

pythondumb
  • 1,187
  • 1
  • 15
  • 30

1 Answers1

0
  • there are multiple libraries and functions for calculating distance. I have used geopy
  • key is aligning data / rows to pass to selected distance function. This use case is shift(-1), however passing NaN to this distance function is invalid, hence last row that does not have a next row is defaulted to (0,0)
import geopy.distance

df = pd.read_csv(io.StringIO("""      payeeId    latHome   longHome     Total_Amnt
 0  193fde722     0.000000   0.000000        15.0
 1  4d8ecb2b5c   28.425515  77.097547        10.0
 2  2c3ea738     28.542923  77.253164        20.0
 3  2961f3e8     28.542898  77.253162        10.0
 4  5cda3d3763   28.461630  77.031944    129000.0
 5  3cb02ccbfc   26.180680  91.740042       220.0
 6  79918aae03    0.000000   0.000000      1760.0"""),sep="\s+",)


# prep tuples to pass to geopy.distance
df['Dist_p'] = df.loc[:, ["latHome", "longHome"]].join(
    df.loc[:, ["latHome", "longHome"]].shift(-1).fillna(0), rsuffix="_2"
).apply(
    lambda r: geopy.distance.geodesic(
        (r["latHome"], r["longHome"]), (r["latHome_2"], r["longHome_2"])
    ).km,
    axis=1,
)

payeeId latHome longHome Total_Amnt Dist_p
0 193fde722 0 0 15 8753.19
1 4d8ecb2b5c 28.4255 77.0975 10 20.0375
2 2c3ea738 28.5429 77.2532 20 0.00277761
3 2961f3e8 28.5429 77.2532 10 23.4558
4 5cda3d3763 28.4616 77.0319 129000 1476.46
5 3cb02ccbfc 26.1807 91.74 220 10189.4
6 79918aae03 0 0 1760 0
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30