3

Having a dfA with a column called geometry with the following geometrical shapes:

d = {'id': [1, 2], 'geometry': ['POINT (-70.66000 -33.45000)', 'POINT (-74.08000 4.60000)']}
dfA = pd.DataFrame(data=d)
dfA

|   | id | geometry              |
|---|----|-----------------------|
| 0 | 1  | POINT (-70.66 -33.45) |
| 1 | 2  | POINT (-74.08 4.6)    |

I would like to calculate the minimum geodesic distance with each of the geometric shapes of the dfB's geometry column:

d = {'id': [1, 2, 3], 'geometry': ['LINESTRING (-58.66000 -34.58000, -59.66000 -35.58000)', 'LINESTRING (-47.91000 -15.78000, -48.91000 -16.78000)', 'POINT (-66.86000 10.48000)']}
dfB = pd.DataFrame(data=d)
dfB

|   | id | geometry                                  |
|---|----|-------------------------------------------|
| 0 | 1  | LINESTRING (-58.66 -34.58, -59.66 -35.58) |
| 1 | 2  | LINESTRING (-47.91 -15.78, -48.91 -16.78) |
| 2 | 3  | POINT (-66.86 10.48)                      |

I have tried to do this calculation using the Python shapely and geopandas libraries by following the steps below:

from shapely import wkt
import geopandas as gpd

dfA['geometry'] = dfA['geometry'].apply(wkt.loads)
dfA = gpd.GeoDataFrame(dfA, geometry='geometry')
dfB['geometry']= dfB['geometry'].apply(wkt.loads)
for i, value in dfB.iterrows():
    e = dfB.iloc[i]['id']
    dfA[str(e)] = dfA['geometry'].distance(dfB.iloc[i]['geometry'])
dfA

|   | id | geometry              | 1           | 2           | 3           |
|---|----|-----------------------|-------------|-------------|-------------|
| 0 | 1  | POINT (-70.66 -33.45) | 11,20432506 | 27,40349248 | 44,09404608 |
| 1 | 2  | POINT (-74.08 4.6)    | 42,10521108 | 33,0247377  | 9,311433832 |

Unfortunately, shapely distance function calculates the Euclidean Distance and not the geodesic distance.

Another strategy to follow would be to use a function that calculates the geodesic distance from point A to all points on line B [B1, B2, B3,...] and keep the minimum distance. That is to say: dist_A-B = min(geodist(A, B1), geodist(A, B2), geodist(A, B3), ....)

This solution works but computationally it is very expensive since we are talking about a calculation from thousands of points against thousands of lines. Any other more optimal way to perform this calculation will be of a lot of help.

David Ordoñez
  • 55
  • 1
  • 1
  • 8
  • 1
    Just a note: You can use `pyproj` or `geopy` to calculate geodesic distance between points, but I am not aware of efficient geodesic distance method between linestrings or polygons. – martinfleis May 22 '20 at 12:57
  • I think this is what OP is referring to in the second half of the question, the issue here is that they're basically calculating the distance between all the points in the first linestring against all the points in linestring 2. – Andre.IDK May 22 '20 at 16:51

1 Answers1

2

If you can reduce the problem to figuring the geodesic distance to a collection of points, then a vantage point tree will give you an efficient solution. See my answer to a similar question here; this includes the solution in python.

cffk
  • 1,839
  • 1
  • 12
  • 17