Given a dataframe df
as follows:
id location lon lat
0 1 Onyx Spire 116.35425 39.87760
1 2 Unison Lookout 116.44333 39.93237
2 3 History Lookout 116.14857 39.73727
3 4 Domination Pillar 116.46387 39.96286
4 5 Union Tower 116.36373 39.95064
5 6 Ruby Forest Obelisk 116.35786 39.89463
6 7 Rust Peak Pillar 116.34870 39.98170
7 8 Ash Forest Tower 116.38461 39.94938
8 9 Prestige Mound Tower 116.34052 39.98977
9 10 Sapphire Mound Tower 116.35063 39.92982
10 11 Kinship Lookout 116.43020 39.99997
11 12 Exhibition Obelisk 116.45108 39.94371
For each location
, I need to find out other locations names if the distance between them are less than and equal to, say 5 km.
The code based on answers from this link:
from scipy.spatial import distance
from math import sin, cos, sqrt, atan2, radians
def get_distance(point1, point2):
R = 6370
lat1 = radians(point1[0]) #insert value
lon1 = radians(point1[1])
lat2 = radians(point2[0])
lon2 = radians(point2[1])
dlon = lon2 - lon1
dlat = lat2- lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
return distance
all_points = df[['lat', 'lon']].values
dm = distance.cdist(all_points, all_points, get_distance)
pd.DataFrame(dm, index=df.index, columns=df.index)
Out:
0 1 2 ... 9 10 11
0 0.000000 9.736316 23.494395 ... 5.813891 15.066709 11.054762
1 9.736316 0.000000 33.222475 ... 7.908015 7.598415 1.423357
2 23.494395 33.222475 0.000000 ... 27.492814 37.822285 34.549129
3 13.312235 3.815179 36.787014 ... 10.327235 5.024900 2.391864
4 8.160542 7.082601 30.000842 ... 2.569988 7.883467 7.484839
5 1.918235 8.409888 25.009951 ... 3.960618 13.235325 9.641336
6 11.583243 9.752599 32.096627 ... 5.770232 7.233093 9.692770
7 8.389761 5.350670 31.017383 ... 3.622002 6.835323 5.700434
8 12.525586 10.838805 32.501864 ... 6.720541 7.722060 10.722467
9 5.813891 7.908015 27.492814 ... 0.000000 10.334273 8.701063
10 15.066709 7.598415 37.822285 ... 10.334273 0.000000 6.502921
11 11.054762 1.423357 34.549129 ... 8.701063 6.502921 0.000000
But I would like to get a output similar to the following dataframe. Please note location1
, location2
, location3
are the names of locations which have distance <= 5 km from location
(the paired location names may be not accurate, just using as examples to help understand), if it's NaN
, then no such location
exists:
id location ... location2 location3
0 1 Onyx Spire ... NaN NaN
1 2 Unison Lookout ... NaN NaN
2 3 History Lookout ... NaN NaN
3 4 Domination Pillar ... NaN NaN
4 5 Union Tower ... NaN NaN
5 6 Ruby Forest Obelisk ... NaN NaN
6 7 Rust Peak Pillar ... NaN NaN
7 8 Ash Forest Tower ... Kinship Lookout NaN
8 9 Prestige Mound Tower ... NaN NaN
9 10 Sapphire Mound Tower ... NaN NaN
10 11 Kinship Lookout ... Ruby Forest Obelisk Domination Pillar
11 12 Exhibition Obelisk ... NaN NaN
How could I do that in Python? Thanks.