I have the code below that works for calculating distances between coordinates of cities where a trip with public transport is started and the coordinates of cities where the trip is ended and returns the value. There is a unique number of combinations from a particular city and to a particular city, The problem is that I have a large data set of around 1.2 million records and the code is rather slow because it iterates for each combination. How can I rearrange the loop so it calculates the distances between coordinates for the unique combinations and applies it to combinations that are repeated? Is there any way that takes less processing times?
df_distance = []
for row in clean_df.iterrows():
try:
coords_1 = (row[1].Lat_x, row[1].Lng_x)
coords_2 = (row[1].Lat_y, row[1].Lng_y)
distance = geodesic(coords_1, coords_2).km
df_distance.append(distance)
#print(geodesic(coords_1, coords_2).km)
except ValueError as e:
print(row)