I would first create those non-duplicated exhaustive combinations in Dataframe.
I don't know the terminology, but it is sortof cartesian product minus the (combination of itself and the duplicated combination of 2 locations).
In matrix sense, what you need is upper triangle matrix.
>>> import numpy as np
>>> np.triu(1- np.eye(len(df)))
array([[0., 1., 1., 1., 1.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0.]])
Using this matrix as Dataframe index, I get Dataframe with the exhaustive list of 2 locations combinations.
>>> i, j = np.where(np.triu(1- np.eye(len(df))))
>>> df = (df.iloc[i].reset_index(drop=True)
.join(df.iloc[j].reset_index(drop=True), lsuffix='_x', rsuffix='_y'))
>>> df
id_x Latitude_x Longitude_x id_y Latitude_y Longitude_y
0 1 33.110348 -83.259740 2 33.500308 -86.792691
1 1 33.110348 -83.259740 3 30.428149 -92.326981
2 1 33.110348 -83.259740 4 33.493309 -86.828201
3 1 33.110348 -83.259740 5 36.433678 -89.025341
4 2 33.500308 -86.792691 3 30.428149 -92.326981
5 2 33.500308 -86.792691 4 33.493309 -86.828201
6 2 33.500308 -86.792691 5 36.433678 -89.025341
7 3 30.428149 -92.326981 4 33.493309 -86.828201
8 3 30.428149 -92.326981 5 36.433678 -89.025341
9 4 33.493309 -86.828201 5 36.433678 -89.025341
This doesn't include any duplicates of combination, so then, I can calculate the distance for each row.
>>> df['result'] = df.apply(lambda row: hs.haversine([row['Latitude_x'], row['Longitude_x']], [row['Latitude_y'], row['Longitude_y']]), axis=1)
>>> df
id_x Latitude_x Longitude_x id_y Latitude_y Longitude_y result
0 1 33.110348 -83.259740 2 33.500308 -86.792691 331.157711
1 1 33.110348 -83.259740 3 30.428149 -92.326981 907.184077
2 1 33.110348 -83.259740 4 33.493309 -86.828201 334.342337
3 1 33.110348 -83.259740 5 36.433678 -89.025341 643.134695
4 2 33.500308 -86.792691 3 30.428149 -92.326981 623.748640
5 2 33.500308 -86.792691 4 33.493309 -86.828201 3.383468
6 2 33.500308 -86.792691 5 36.433678 -89.025341 384.390670
7 3 30.428149 -92.326981 4 33.493309 -86.828201 620.539431
8 3 30.428149 -92.326981 5 36.433678 -89.025341 734.575638
9 4 33.493309 -86.828201 5 36.433678 -89.025341 383.356886
If your Dataframe is big, apply
is not the best performant method, so if you worry about the performance of the apply
, you can try np.vectorize
.
%%timeit
df['result'] = df.apply(lambda row: hs.haversine([row['Latitude_x'], row['Longitude_x']], [row['Latitude_y'], row['Longitude_y']]), axis=1)
1.02 ms ± 33.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df['result'] = np.vectorize(hs.haversine)(df[['Latitude_x', 'Longitude_x']], df[['Latitude_y', 'Longitude_y']])
430 µs ± 9.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)