I have data set of bike sharing. The data has lan and long for each station. A sample of data is like below. I want to find each 3 station that are close to each other in term of coordinate and sum up the count for each of subcategory (3 closest points).
I know how we can calculate the distance between two point. but I don't know how to program this, in term of finding each 3 subset of closest coordinates.
The code for calculating distance between 2 point:
from math import cos, asin, sqrt, pi
def distance(lat1, lon1, lat2, lon2):
p = pi/180
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1-cos((lon2-lon1)*p))/2
return 12742 * asin(sqrt(a))
The code is for calculating each subset, but it gives me error
from itertools import combinations
best = (float('inf'), None)
for combination in combinations(range(len(data)), 3):
total_distance = 0
for idx_1, idx_2 in [(0, 1), (1, 2), (0, 2)]:
total_distance += distance(
combination[idx_1]['start_station_latitude'],
combination[idx_1]['start_station_longitude'],
combination[idx_2]['start_station_latitude'],
combination[idx_2]['start_station_longitude'],
)
if total_distance < best[0]:
best = (total_distance, combination)
print(f'Best combination is {best[1]}, total distance: {best[0]}')
The data :
start_station_name start_station_latitude start_station_longitude. count
0 Schous plass 59.920259 10.760629. 2
1 Pilestredet 59.926224 10.729625. 4
2 Kirkeveien 59.933558 10.726426. 8
3 Hans Nielsen Hauges plass 59.939244 10.774319. 0
4 Fredensborg 59.920995 10.750358. 8
5 Marienlyst 59.932454 10.721769. 9
6 Sofienbergparken nord 59.923229 10.766171. 3
7 Stensparken 59.927140 10.730981. 4
8 Vålerenga 59.908576 10.786856. 6
9 Schous plass trikkestopp 59.920728 10.759486. 5
10 Griffenfeldts gate 59.933703 10.751930. 4
11 Hallénparken 59.931530 10.762169. 8
12 Alexander Kiellands Plass 59.928058 10.751397. 3
13 Uranienborgparken 59.922485 10.720896. 2
14 Sommerfrydhagen 59.911453 10.776072 1
15 Vestkanttorvet 59.924403 10.713069. 8
16 Bislettgata 59.923834 10.734638 9
17 Biskop Gunnerus' gate 59.912334 10.752292 1
18 Botanisk Hage sør 59.915282 10.769620 1
19 Hydroparken. 59.914145 10.715505 1
20 Bøkkerveien 59.927375 10.796015 1
what I want is :
closest count_sum
Schous plass, Pilestredet, Kirkeveien. 14
.
.
.
The Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-1a4d3a72c23d> in <module>
7 for idx_1, idx_2 in [(0, 1), (1, 2), (0, 2)]:
8 total_distance += distance(
----> 9 combination[idx_1]['start_station_latitude'],
10 combination[idx_1]['start_station_longitude'],
11 combination[idx_2]['start_station_latitude'],
TypeError: 'int' object is not subscriptable