I'm looking for fastest way to get distance between two latitude and longitude. One pair is from user and the other pair is from marker
. Below is my code :
import geopy
import pandas as pd
marker = pd.read_csv(file_path)
coords_2 = (4.620881605,101.119911)
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
Previously, I used apply
which is extremely slow :
marker['Distance2'] = marker.apply(lambda x: round(geopy.distance.geodesic((x.Latitude,x.Longitude), (coords_2)).m,2), axis = 1)
Then, I used Pandas Series vectorization :
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
I'm receiving error :
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I added all()
and any()
to test (such that marker['Latitude'].values.all(),marker['Longitude'].values.all()
and vice versa). However, the result calculated was entirely wrong from both any()
and all()
.
This is my result:
Latitude Longitude Distance Distance2
0 4.620882 101.119911 11132307.42 0.00
1 4.620125 101.120399 11132307.42 99.72
2 4.619368 101.120885 11132307.42 199.26
where Distance
is the result from vectorization which is INCORRECT, whereas Distance2
is the result from using apply
which is CORRECT. Simply, Distance2
is my expected outcome.
WITHOUT USING apply
, I want to produce faster result with correct output.