I would like to create the link list of the transport routes from a GTFS database snapshot. Essentially, I have to create a list of bus lines, where you can switch lines through a common stop.
I have the stops precleaned in a dataframe:
stops = pd.DataFrame({'stop_id': ['002133', '003002', '003118', '003209', '004521', '004716', '004903', '006390', '007177', '007289'],
'stop_name':['Örs vezér tere M+H, déli tárolótér', 'Puskás Ferenc Stadion', 'Óbuda, Bogdáni út', 'Batthyány tér', 'Puskás Ferenc Stadion', 'ÉD metró járműtelep,porta', 'Örs vezér tere', 'Cinkota kocsiszín', 'Csepel kocsiszín', 'Vihar utca'],
'location': [(47.500366, 19.1357), (47.500368, 19.103406), (47.551471, 19.041971), (47.506776, 19.039318), (47.50017, 19.104773), (47.469651, 19.12909), (47.503585, 19.137192), (47.519345, 19.217072), (47.421498, 19.066247), (47.434399, 19.035664)]})
... and I tried to generate a new list of dictionaries, which compares every row to every row of the dataframe:
data = [{'stop1': row1[1],
'stop2': row2[1],
'stop1_id': row1[2],
'stop2_id': row2[2],
'stop1_location': row1[0],
'stop2_location': row2[0],
} for row1 in zip(stops.location,
stops.stop_name,
stops.stop_id) \
for row2 in zip(stops.location,
stops.stop_name,
stops.stop_id) \
if (row1[1] != row2[1] and geopy.distance.geodesic(row1[0], row2[0]).meters < 500) or row1[1] == row2[1]]
the output would be similar to this:
stop1, stop2, stop1_id, stop2_id, stop1_location, stop2_location
'Örs vezér tere M+H, déli tárolótér', 'Örs vezér tere', '002133', '004903', (47.500366, 19.1357), (47.503585, 19.137192)
If two stops has different names, and their distance is less than 500 meters OR their names are the same THEN append their properties to a list of dicts.
The problem is that this runs forever, regardless of whether the board has only 6000 rows. How can I optimize the code, so it doesn't run for ever? Is there a better solution for this?
My final goal would be to do a graph visualization of them with networkx.