-1

I am working on a data frame that looks like this :

             lat       lon
id_zone
0        40.0795  4.338600
1        45.9990  4.829600
2        45.2729  2.882000
3        45.7336  4.850478
4        45.6981  5.043200

I'm trying to make a Haverisne distance matrix. Basically for each zone, I would like to calculate the distance between it and all the others in the dataframe. So there should be only 0s on the diagonal. Here is the Haversine function that I use but I can't make my matrix.

def haversine(x):
    x.lon, x.lat, x.lon2, x.lat2 = map(radians, [x.lon, x.lat, x.lon2, x.lat2])
    # formule de Haversine
    dlon = x.lon2 - x.lon
    dlat = x.lat2 - x.lat
    a = sin(dlat / 2) ** 2 + cos(x.lat) * cos(x.lat2) * sin(dlon / 2) ** 2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    km = 6367 * c
    return km
martineau
  • 119,623
  • 25
  • 170
  • 301
firecfly
  • 3
  • 1
  • 1
    Here on StackOverflow you should not ask for a complete solution. Try to solve your task and ask about a specific problem you encounter. "I can't make my matrix." does not describe your problem enough. Show your relevant code (as text, not picture), describe what you expect it to do and what really happens. – pabouk - Ukraine stay strong Apr 01 '22 at 08:47

1 Answers1

0

You can use the solution to this answer Pandas - Creating Difference Matrix from Data Frame

Or in your specific case, where you have a DataFrame like this example:

             lat       lon
id_zone
0        40.0795  4.338600
1        45.9990  4.829600
2        45.2729  2.882000
3        45.7336  4.850478
4        45.6981  5.043200

And your function is defined as:

def haversine(first, second):
    # convert decimal degrees to radians
    lat, lon, lat2, lon2 = map(np.radians, [first[0], first[1], second[0], second[1]])

    # haversine formula
    dlon = lon2 - lon
    dlat = lat2 - lat
    a = np.sin(dlat/2)**2 + np.cos(lat) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

Where you pass the lat and lon of the first location and the second location.

You can then create a distance matrix using Numpy and then replace the zeros with the distance results from the haversine function:

# create a matrix for the distances between each pair of zones
distances = np.zeros((len(df), len(df)))
for i in range(len(df)):
    for j in range(len(df)):
        distances[i, j] = haversine(df.iloc[i], df.iloc[j])
pd.DataFrame(distances, index=df.index, columns=df.index)

Your output should be similar to this:

id_zone           0           1           2           3           4
id_zone
0          0.000000  659.422944  589.599339  630.083979  627.383858
1        659.422944    0.000000  171.597296   29.555376   37.325316
2        589.599339  171.597296    0.000000  161.731366  174.983855
3        630.083979   29.555376  161.731366    0.000000   15.474533
4        627.383858   37.325316  174.983855   15.474533    0.000000
martineau
  • 119,623
  • 25
  • 170
  • 301
Osiris
  • 73
  • 1
  • 9