I have a table of locations (right now in dataframe) and want to calculate all combinations and their distance from eachother.
Input:
ID | Lat | Lon |
---|---|---|
1 | 6,4355 | 53,2245 |
2 | 5,3434 | 50,2345 |
3 | 4,3434 | 51,2345 |
Desired Outcome:
ID1 | ID2 | distance |
---|---|---|
1 | 1 | 0 |
1 | 2 | 1 |
1 | 3 | 2 |
2 | 1 | 0 |
2 | 2 | 3 |
2 | 3 | 4 |
3 | 1 | 0 |
3 | 2 | 5 |
3 | 3 | 6 |
def distance(lat1, lon1, lat2, lon2):
lat1 = radians(lat1)
lon1 = radians(lon1)
lat2 = radians(lat2)
lon2 = radians(lon2)
dlon = lon2 - lon1
dlat = lat2 - lat1
R = 6373.0
a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
return round(R * c)
Right now i loop through the dataframe 2x in such an ugly way that i'm not even going to show, but it works. Problem is that it is terribly slow when the table gets big and i know there must be a faster way to do this.
If i can do this in standard python/pandas/numpy (as long as its fast and i dont have to use obscure packages!) Any help would be much appreciated Oh and i want to filter on distance < 10km, forgot to add!!
Here my current code i want to improve:
df_distance = pandas.DataFrame(columns=['ID1', 'ID2', 'distance'])
""" first all id with themselves """
for index, row in df.iterrows():
df_new_row = pandas.DataFrame([{'ID1': row['ID'], 'ID2': row['ID'],
'distance': 0, 'lat1': row['Lat'], 'lon1': row['Lon'],
'lat2': row['Lat'], 'lon2': row['Lon']}])
df_distance = pandas.concat([df_distance, df_new_row])
for index1, row1 in df.iterrows():
for index2, row2 in df.iterrows():
if index2 > index1:
dist = distance(row1['Lat'], row1['Lon'], row2['Lat'], row2['Lon'])
if dist <= 10: # filter at lower than 10km
""" add both directions """
df_new_row = pandas.DataFrame([{'ID1': row1['ID'], 'ID2': row2['ID'],
'distance': dist, 'lat1': row1['Lat'], 'lon1': row1['Lon'],
'lat2': row2['Lat'], 'lon2': row2['Lon']},
{'ID1': row2['ID'], 'ID2': row1['ID'],
'distance': dist, 'lat1': row2['Lat'], 'lon1': row2['Lon'],
'lat2': row1['Lat'], 'lon2': row1['Lon']}
])
df_distance = pandas.concat([df_distance, df_new_row])