I have a pandas dataframe A, with latitude longitudes.
import pandas as pd
df_a = pd.DataFrame([['b',1.591797,103.857887],
['c',1.589416, 103.865322]],
columns = ['place','lat','lng'])
I have another dataframe of locations B, also with latitude longitudes.
df_b = pd.DataFrame([['ref1',1.594832, 103.853703],
['ref1',1.589749, 103.864678]],
columns = ['place','lat','lng'])
For every row in A, i want to find the closest matching row in B (subject to distance limit). --> I already have a function that calculates the distance between two GPS pairs
intended output
# a list where each row is the corresponding closest index in B
In [13]: min_index_arr
Out[13]: [0, 1]
One way of doing this is:
def haversine(pair1, pair2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
lon1, lat1 = pair1
lon2, lat2 = pair2
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
import operator
min_vals = []
for i in df_a.index:
pair1 = df_a['lat'][i], df_a['lng'][i]
dist_array = []
for j in df_b.index:
pair2 = df_b['lat'][j], df_b['lng'][j]
dist = haversine(pair1, pair2)
dist_array.append(dist)
min_index, min_value = min(enumerate(dist_array), key=operator.itemgetter(1))
min_vals.append(max_index)
But im sure there's a faster way to do this, it seems very similar to an outer product, except not a product, and using a function. Does anyone know how?