I have a pandas dataframe containing 500.000(!) rows (locations) and two columns:
- Longitude
- Latitude
Now I want a third column:
- Nearest location
This column should tell me which row/location is nearest to the 'current' row/location.
I know you can find the distance between two lon/lat using for example cdist
from scipy.spatial.distance
. However, this takes too much time, since it has to loop through the data set 500.000 * 500.000 times (because it tries to find the distance to each location, for every location).
Does anyone know how an appropriate way to deal with this?