I have a dataframe with three columns: id, latitude and longitude. For each row, I need to find the rows with a distance lower than some fixed values.
The solution I'm using is a double for loop, and I'm looking for more efficient implementations.
Here's my current code:
import pandas as pd
def distance(coord1,coord2):
...
return float_distance_in_km
df=pd.read_csv("coordinates.csv",na_values=None)
lessThan1=list()
lessThan5=list()
lessThan10=list()
lessThan50=list()
for i in range(0,len(df)):
lessThan1_row=list()
lessThan5_row=list()
lessThan10_row=list()
lessThan50_row=list()
if df['longitude'][i] is not None and df['latitude'][i] is not None:
coords_1=(df['longitude'][i],df['latitude'][i])
for j in range(0,len(df)):
if i==j:
continue
if df['longitude'][j] is None or df['latitude'][j] is None:
continue
coords_2=(df['longitude'][j],df['latitude'][j])
dist=distance(coords_1, coords_2)
neighbor=df['id'][j]
if dist<1:
lessThan1_row.append(neighbor)
elif dist<5:
lessThan5_row.append(neighbor)
elif dist<10:
lessThan10_row.append(neighbor)
elif dist<50:
lessThan50_row.append(neighbor)
lessThan1.append(lessThan1_row)
lessThan5.append(lessThan5_row)
lessThan10.append(lessThan10_row)
lessThan50.append(lessThan50_row)
df["1km"]=lessThan1
df["5km"]=lessThan5
df["10km"]=lessThan10
df["50km"]=lessThan50
The dataframe output is not mandatory, I just happen to have the dataset loaded as dataframe.