0

I'd like some help with a task. I'm a Python begginer and I'm trying to calculate the distance between sequential items. For ex. item1 to item2 then item2 to item3 and so on.

There's only one problem, in my dataframe I must partition these calculations to the field ZCGUNLEIT as it indicates a route. So any ZCGUNLEIT will have ~300 coodinates, and I must know the distance between these 300 coodinates and then move on to the next ZCGUNLEIT.

I tried haversine library but couldn't understand how to integrate that to my dataframe.

If anyone can shed some light here, it will be appreciated.

OBS: This dataframe has millions of rows.

list of items with lat and long

ANRIOS2020
  • 35
  • 5
  • 1
    https://stackoverflow.com/questions/19412462/getting-distance-between-two-points-based-on-latitude-longitude – Ran A Mar 29 '22 at 13:20

1 Answers1

1

from answer in this question : Getting distance between two points based on latitude/longitude

the Haversine formula which assumes the earth is a sphere, which results in errors of up to about 0.5% (according to help(geopy.distance)). Vincenty distance uses more accurate ellipsoidal models such as WGS-84, and is implemented in geopy. For example,

import geopy.distance

coords_1 = (52.2296756, 21.0122287)
coords_2 = (52.406374, 16.9251681)

print geopy.distance.vincenty(coords_1, coords_2).km

will print the distance of 279.352901604 kilometers using the default ellipsoid WGS-84. (You can also choose .miles or one of several other distance units).

so for your question, if your data is defined as pandas dataFrame, as an example:

import geopy.distance
import pandas as pd
df=pd.DataFrame(data=[[53.2296756,21.0122287],[52.406374,16.9241681],[52.2296756,21.0112287],[55.406374,16.9231681]],columns=['LATITUDE','LANGTITUDE'])

dist=[0]
for i in range(1,len(df)):
  dist.append(geopy.distance.vincenty((df.LATITUDE.iloc[i],df.LANGTITUDE.iloc[i]),(df.LATITUDE.iloc[i-1],df.LANGTITUDE.iloc[i-1])).km)

df['distance']=dist
df

enter image description here

Ran A
  • 746
  • 3
  • 7
  • 19
  • Thanks, it was very helpful! How I'd go about partitioning the part where the iteration uses len(df) since not all the rows are sequential? They need to run inside the same route (ZCGUNLEIT) and then start again when the code for the route changes. – ANRIOS2020 Mar 29 '22 at 14:21
  • If you have that specific condition, you can add an initialization in the loop : ` if df.ZCGUNLEIT.iloc[i]==df.ZCGUNLEIT.iloc[i-1]: calculate...// else: dist.append(0)` so in this way the new route is sperate then the other by the null point 0 . – Ran A Mar 29 '22 at 14:24
  • It worked! However, can you explain to me how *df.ZCGUNLEIT.iloc[i]==df.ZCGUNLEIT.iloc[i-1]* works? Couldn't figure it out how i==i-1. – ANRIOS2020 Mar 29 '22 at 19:28
  • it compares each route in a row with the precedent one so, using its index, whenever the route is different than its precedent it initializes the distance to 0 , to start calculating distance from the beginning – Ran A Mar 30 '22 at 08:43
  • if the answer solved your question ,it's preferable to upvote it and accept it as an answer , best of luck ^^ – Ran A Mar 30 '22 at 08:43