I have a similar problem as here How to get the distance between two geographic coordinates of two different dataframes? Two dataframes:
df1 = pd.DataFrame({'id': [1,2,3],
'lat':[-23.48, -22.94, -23.22],
'long':[-46.36, -45.40, -45.80]})
df2 = pd.DataFrame({'id': [100,200,300],
'lat':[-28.48, -22.94, -23.22],
'long':[-46.36, -46.40, -45.80]})
My question is: using the solution suggested by Ben.T there, how could I add rows from df2 to df1, if a point from df2 is not near df? I think, based on that matrix with distances:
from sklearn.metrics.pairwise import haversine_distances
# variable in meter you can change
threshold = 100 # meters
# another parameter
earth_radius = 6371000 # meters
distance_matrix = (
# get the distance between all points of each DF
haversine_distances(
# note that you need to convert to radiant with *np.pi/180
X=df1[['lat','long']].to_numpy()*np.pi/180,
Y=df2[['lat','long']].to_numpy()*np.pi/180)
# get the distance in meter
*earth_radius
# compare to your threshold
< threshold
# **here I want to add rows from df2 to df1 if point from df2 is NOT near df1**
)
E.g. the output looks like this:
Output:
id lat long
1 -23.48 -46.36
2 -22.94 -45.40
3 -23.22 -45.80
4 -28.48 -46.36
5 -22.94 -46.40