0

Context:

  • I have one city dataset with coordinates (lat, long)
  • I have several other specific datasets (hospitals, shops,...) with coordinates (lat, long) too

My objective is to find, for each city, the closest (or the N closest) of every other datasets.

Code:

I defined a function to calculate a Haversine distance:

def dist(lat1, long1, lat2, long2):
   # convert decimal degrees to radians 
   lat1, long1, lat2, long2 = map(radians, [lat1, long1, lat2, long2])
   # haversine formula 
   dlon = long2 - long1 
   dlat = lat2 - lat1 
   a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
   c = 2 * asin(sqrt(a)) 
   # Radius of earth in kilometers is 6371
   km = 6371* c
   return km

Now I use this function to find the nearest point:

def find_nearest(lat, long, recherche):
   distances = recherche.apply(
       lambda x: dist(lat, long, x['rech_lat'], x['rech_lon']), 
       axis=1)
return recherche.loc[distances.idxmin(), 'rech_id']

Which I call like this:

CITY['hospital_id'] = CITY.apply(lambda x: find_nearest(x['COM_LAT'], x['COM_LONG'],hospital),axis=1)

Problem:

Doing so, I need to pass the hospital dataframe every time. I am not sure it is very performant. I thought using the reference of the dataframe with the eval function instead:

def find_nearest(lat, long, recherch):
   recherche = eval(recherch)
   distances = recherche.apply(
       lambda x: dist(lat, long, x['rech_lat'], x['rech_lon']), 
       axis=1)
return recherche.loc[distances.idxmin(), 'rech_id']

CITY['hospital_id'] = CITY.apply(lambda x: find_nearest(x['COM_LAT'], x['COM_LONG'],'hospital'),axis=1)

Is it better? I still can't have fast answer. Do you know how I can improve more?

Thanks for answers

  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Dec 01 '22 at 19:18

0 Answers0