I am working on a project for university, where I have two pandas dataframes:
# Libraries
import pandas as pd
from geopy import distance
# Dataframes
df1 = pd.DataFrame({'id': [1,2,3],
'lat':[-23.48, -22.94, -23.22],
'long':[-46.36, -45.40, -45.80]})
df2 = pd.DataFrame({'id': [100,200,300],
'lat':[-28.48, -22.94, -23.22],
'long':[-46.36, -46.40, -45.80]})
I need to calculate distances between geographic latitude and longitude coordinates between dataframes. So I used geopy. If the distance between the coordinate combination is less than a threshold of 100 meters, then I must assign the value 1 in the 'nearby' column. I made the following code:
threshold = 100 # meters
df1['nearby'] = 0
for i in range(0, len(df1)):
for j in range(0, len(df2)):
coord_geo_1 = (df1['lat'].iloc[i], df1['long'].iloc[i])
coord_geo_2 = (df2['lat'].iloc[j], df2['long'].iloc[j])
var_distance = (distance.distance(coord_geo_1, coord_geo_2).km) * 1000
if(var_distance < threshold):
df1['nearby'].iloc[i] = 1
Although a warning appears, the code is working. However, I would like to find a way to override for() iterations. It's possible?
# Output:
id lat long nearby
1 -23.48 -46.36 0
2 -22.94 -45.40 0
3 -23.22 -45.80 1