0

I'm trying to compare latitude and longditude data in each successive row to get the distance between points. This information is in a dataframe.

                                lat        long
        name                                   
        Veronica Session  11.463798   14.136215
        Lynne Donahoo     44.405370  -82.350737
        Debbie Hanley     14.928905  -91.344523
        Lisandra Earls    68.951464 -138.976699
        Sybil Leef        -1.678356   33.959323

Im using the below code, from this solution (Pandas Latitude-Longitude to distance between successive rows), but I get this error "TypeError: cannot do slice indexing on Index with these indexers [1] of type int". I was unable to resolve this error, I suspect its a basic mistake. Any help would be appreciated

df = pd.DataFrame(Child_data)

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
    return earth_radius * 2 * np.arcsin(np.sqrt(a))

df['dist'] = \
haversine(df.lat.shift(), df.long.shift(),
             df.loc[1:,'lat'], df.loc[1:,'long'],to_radians=False)```
Milo
  • 33
  • 5
  • Thanks to everyone who helped me expand my solution slightly on my last question (https://stackoverflow.com/questions/74670372). – Milo Dec 05 '22 at 20:42

1 Answers1

0

It looks to me like you need to use the following:

df['dist'] = haversine(df['lat'].shift(), df['long'].shift(), df['lat'], df['long'])  

which for your example yields:

    name                lat         long        dist
0   Veronica Session    11.463798   14.136215   NaN
1   Lynne Donahoo       44.405370   -82.350737  9625.250650
2   Debbie Hanley       14.928905   -91.344523  3385.895548
3   Lisandra Earls      68.951464   -138.976699 6859.237677
4   Sybil Leef          -1.678356   33.959323   2515.847395
itprorh66
  • 3,110
  • 4
  • 9
  • 21
  • That does work, thank you. Why would the first result be NaN? If it's working through successive rows, shouldn't the last row be NaN? Just for my understanding. – Milo Dec 06 '22 at 08:56
  • 1
    The reason is that df['lat'].shift() and df['lon'].shift() produce Series arrays, while df.iloc[1, 'lat'] and df.iloc[1, 'lon'] produce single values, so you are trying to compute an array against a single value. To verify printout the result of df['lat'] and df.iloc[1, 'lat'] and see the differences. – itprorh66 Dec 06 '22 at 13:57