-2

I have a list of Lat and Long approximately 5 Million rows of data. I tried below code to create a buffer of 25ft around each point and assign a new Location Id to all the points that fall in that buffer. The only issue here is the performance of the code. Please help, I am new to python and dealing with the huge dataset. Any help on this is much appreciated!

import geopy.distance
Coord_List = Sample_Data.Lat_Long.values.tolist()
Coord_List_E = [""]*len(Coord_List)
k =1

for i in range(len(Coord_List)):
    #if  i==0:
        #New_List[i]=k
    if Coord_List_E[i]=="":
        #New_List[i]=k
        for j in range(i,len(Coord_List)):
            if Coord_List_E[j]=="" and abs(geopy.distance.distance(Coord_List[i],Coord_List[j]).ft)<=25 :
                Coord_List_E[j]=k
                Coord_List_E[i]=k
                #print(i,j,k)
        k+=1
    else:
        pass
        
    
print(Coord_List_E)
ppwater
  • 2,315
  • 4
  • 15
  • 29
  • 1
    In order to use Numba, you would need a Numba implementation of the distance function. The method from Karney you're using now is quite elaborate. For short distances like this the Haversine approximation is faster and easier to implement. But it depends on the accuracy required accuracy. – Rutger Kassies Jan 08 '20 at 16:23
  • The question is a bit unclear - could you specify how your input and your output looks like? Your Coord_List, is it pairs of lat/lon (what's the array shape)? What is your desired output? Total distance along all lat/lon entries or distances between specific pairs? I could then adjust my answer accordingly... – FObersteiner Jan 09 '20 at 13:49

1 Answers1

2

you could numba-njit a Python implementation of the Haversine distance:

from math import sin, cos, sqrt, atan2, radians
from numba import njit

@njit
def calc_latlon_dist(lat, lon):
    """
    calculate Haversine distance along lat/lon coordinates
    """
    R = 6373.0 # approximate radius of earth in km
    dist = 0.

    for j in range(lat.shape[0]-1):
        lat0, lat1 = radians(lat[j]), radians(lat[j+1])
        lon0, lon1 = radians(lon[j]), radians(lon[j+1])

        dlon = lon1 - lon0
        dlat = lat1 - lat0

        a = sin(dlat / 2)**2 + cos(lat0) * cos(lat1) * sin(dlon / 2)**2
        c = 2 * atan2(sqrt(a), sqrt(1 - a))

        dist += R * c

    return dist

[Source]

FObersteiner
  • 22,500
  • 8
  • 42
  • 72