5

I have two polygons, P and Q, where the exterior linear ring of a polygon is defined by two closed sets of points, stored as numpy arrays, connected in a counterclockwise direction. P and Q are in the following format:

P['x_coords'] = [299398.56 299402.16 299410.25 299419.7  299434.97 299443.75 299454.1 299465.3  299477.   299488.25 299496.8  299499.5  299501.28 299504. 299511.62 299520.62 299527.8  299530.06 299530.06 299525.12 299520.2 299513.88 299508.5  299500.84 299487.34 299474.78 299458.6  299444.66 299429.8  299415.4  299404.84 299399.47 299398.56 299398.56] 
P['y_coords'] = [822975.2  822989.56 823001.25 823005.3  823006.7  823005.06 823001.06 822993.4  822977.2  822961.   822943.94 822933.6  822925.06 822919.7 822916.94 822912.94 822906.6  822897.6  822886.8  822869.75 822860.75 822855.8  822855.4  822857.2  822863.44 822866.6  822870.6  822876.94 822886.8  822903.   822920.3  822937.44 822954.94 822975.2]

Q['x_coords'] = [292316.94 292317.94 292319.44 292322.47 292327.47 292337.72 292345.75 292350.   292352.75 292353.5  292352.25 292348.75 292345.75 292342.5 292338.97 292335.97 292333.22 292331.22 292329.72 292324.72 292319.44 292317.2  292316.2  292316.94]
Q['y_coords'] = [663781.   663788.25 663794.   663798.06 663800.06 663799.3  663796.56 663792.75 663788.5  663782.   663773.25 663766.   663762.   663758.25 663756.5  663756.25 663757.5  663761.   663763.75 663767.5  663769.5 663772.25 663777.5  663781.  ]

## SIMPLIFIED AND FORMATTED FOR EASY TESTING:
import numpy as np

px_coords = np.array([299398,299402,299410.25,299419.7,299398])
py_coords = np.array([822975.2,822920.3,822937.44,822954.94,822975.2])

qx_coords = np.array([292316,292331.22,292329.72,292324.72,292319.44,292317.2,292316])
qy_coords = np.array([663781,663788.25,663794,663798.06,663800.06,663799.3,663781])

The exterior ring of P is formed by joining P['x_coords'][0], P['y_coords'][0] -> P['x_coords'][1], P['y_coords'][1] etc. The last coordinate of each array is the same as the first, indicating that the shape is topologically closed.

Is it possible to calculate a simple minimum distance between the exterior rings of P and Q geometrically using numpy? I have searched high and low on SO without finding anything explicit, so I suspect this may be a drastic oversimplification of a very complex problem. I am aware that distance calculations can be done with out-of-the-box spatial libraries such as GDAL or Shapely, but I'm keen to understand how these work by building something from scratch in numpy.

Some things I have considered or tried:

  • Calculate the distance between each point in both arrays. This doesn't work as the closest point between P and Q can be an edge-vertex pair. Using the convex hull of each shape, calculated using scipy.spatial has the same problem.
  • An inefficient brute force approach calculating the distance between every pair of points, and every combination of edge-point pairs

Is there a better way to go about this problem?

bm13563
  • 688
  • 5
  • 18
  • 1
    there are a number of ways of doing this but first you need to decide what you mean by 'distance'. Possibly the most common way is to calculate where the centroid each polygon is then calculate the distance between the centroids or find the vector between centroids then calculate where edges cross that vector. – DrBwts Oct 18 '19 at 10:23
  • Good point - I'm looking for a minimum distance between the exterior rings of the two sets. I have updated my question to reflect this – bm13563 Oct 18 '19 at 10:26
  • HA! just updated my comment after I saw your edit – DrBwts Oct 18 '19 at 10:27
  • I think you need both point-point and line-point distances, see middle part of https://stackoverflow.com/a/56634290/7916438 (below the horizontal line) – tevemadar Oct 18 '19 at 10:27
  • are there any assumptions about convexity and intersection? – Marat Oct 18 '19 at 21:10
  • Are you wanting to consider large polygons (perhaps considered as being >1000 sides)? Will you need to compute multiple distances to any one polygon? – Davis Herring Oct 19 '19 at 03:06
  • @Marat We can assume that the two shapes do not intersect for simplicity. I'm less keen to make assumptions about the convexivity of the shape as ideally I'd like a solution that works for any shape. I'm aware that the problem becomes easier for convex shapes though, so I would still be happy to explore solutions for this – bm13563 Oct 19 '19 at 07:16
  • @DavisHerring ideally yes to both. I'm keen to understand how something like shapely or gdal works, including their scalability. However, any solution will still be interesting – bm13563 Oct 19 '19 at 07:20

2 Answers2

1

Thanks to Davis Herring for his answer - his is not the solution I ended up using (because I'm not very familiar with recursion) but I used the principals he outlined to develop a solution. I am planning on building an index into this solution, as suggested by Davis, to help with very large polygons.

I ended using a brute force approach that compares the distance between each edge of both polygons against each other, calculates the distance, and keeps track the minimum distance. I adapted the answers provided in this question: Shortest distance between two line segments. This method is very loop heavy and was running very slowly, so I adapted it to run in cython to improve efficiency.

pure python
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 6.889256715774536
average time per loop = 0.006896152868643179
max time per loop = 0.022176027297973633
min time per loop = 0.0

cython loop
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 0.046829938888549805
average time per loop = 4.687681570425406e-05
max time per loop = 0.015621423721313477
min time per loop = 0.0

I have attached the pure python version of the code below for clarity, and can provide the cython one if needed.

import numpy as np
import time
import math


def segments_distance(x11, y11, x12, y12, x21, y21, x22, y22):
    if segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22): return 0
    distances = []
    distances.append(point_segment_distance(x11, y11, x21, y21, x22, y22))
    distances.append(point_segment_distance(x12, y12, x21, y21, x22, y22))
    distances.append(point_segment_distance(x21, y21, x11, y11, x12, y12))
    distances.append(point_segment_distance(x22, y22, x11, y11, x12, y12))
    return min(distances)


def segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22):
    dx1 = x12 - x11
    dy1 = y12 - y11
    dx2 = x22 - x21
    dy2 = y22 - y21
    delta = dx2 * dy1 - dy2 * dx1
    if delta == 0: return False  # parallel segments
    s = (dx1 * (y21 - y11) + dy1 * (x11 - x21)) / delta
    t = (dx2 * (y11 - y21) + dy2 * (x21 - x11)) / (-delta)
    return (0 <= s <= 1) and (0 <= t <= 1)


def point_segment_distance(px, py, x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    if dx == dy == 0:  # the segment's just a point
        return math.hypot(px - x1, py - y1)

    # Calculate the t that minimizes the distance.
    t = ((px - x1) * dx + (py - y1) * dy) / (dx * dx + dy * dy)

    # See if this represents one of the segment's
    # end points or a point in the middle.
    if t < 0:
        dx = px - x1
        dy = py - y1
    elif t > 1:
        dx = px - x2
        dy = py - y2
    else:
        near_x = x1 + t * dx
        near_y = y1 + t * dy
        dx = px - near_x
        dy = py - near_y

    return math.hypot(dx, dy)

px_coords=np.array([299398.56,299402.16,299410.25,299419.7,299434.97,299443.75,299454.1,299465.3,299477.,299488.25,299496.8,299499.5,299501.28,299504.,299511.62,299520.62,299527.8,299530.06,299530.06,299525.12,299520.2,299513.88,299508.5,299500.84,299487.34,299474.78,299458.6,299444.66,299429.8,299415.4,299404.84,299399.47,299398.56,299398.56])
py_coords=np.array([822975.2,822989.56,823001.25,823005.3,823006.7,823005.06,823001.06,822993.4,822977.2,822961.,822943.94,822933.6,822925.06,822919.7,822916.94,822912.94,822906.6,822897.6,822886.8,822869.75,822860.75,822855.8,822855.4,822857.2,822863.44,822866.6,822870.6,822876.94,822886.8,822903.,822920.3,822937.44,822954.94,822975.2])
qx_coords=np.array([384072.1,384073.2,384078.9,384085.7,384092.47,384095.3,384097.12,384097.12,384093.9,384088.9,384082.47,384078.9,384076.03,384074.97,384073.53,384072.1])
qy_coords=np.array([780996.8,781001.1,781003.6,781003.6,780998.25,780993.25,780987.9,780981.8,780977.5,780974.7,780974.7,780977.2,780982.2,780988.25,780992.5,780996.8])

px_edges = np.stack((px_coords, np.roll(px_coords, -1)),1)
py_edges = np.stack((py_coords, np.roll(py_coords, -1)),1)
p_edges = np.stack((px_edges, py_edges), axis=-1)[:-1]

qx_edges = np.stack((qx_coords, np.roll(qx_coords, -1)),1)
qy_edges = np.stack((qy_coords, np.roll(qy_coords, -1)),1)
q_edges = np.stack((qx_edges, qy_edges), axis=-1)[:-1]

timings = []
for i in range(1,1000):
    start = time.time()

    edge_distances = [segments_distance(p_edges[n][0][0],p_edges[n][0][1],p_edges[n][1][0],p_edges[n][1][1],q_edges[m][0][0],q_edges[m][0][1],q_edges[m][1][0],q_edges[m][1][1]) for m in range(0,len(q_edges)) for n in range(0,len(p_edges))]

    end = time.time() - start
    timings.append(end)

print(f'shape a edges: {len(px_coords)}')
print(f'shape b edges: {len(qy_coords)}')
print(f'total loops: {i+1}')
print(f'total time = {sum(timings)}')
print(f'average time per loop = {sum(timings)/len(timings)}')
print(f'max time per loop = {max(timings)}')
print(f'min time per loop = {min(timings)}')
bm13563
  • 688
  • 5
  • 18
  • 1
    Note however that for larger inputs, k-d tree algorithms are more efficient than the brute-force approach you use. – norok2 Oct 21 '19 at 16:28
  • ah i see what you mean - i was thinking they could only be used between polygons, but you're right of course, if the polygon was large enough a kd tree would help. i'll edit my answer now – bm13563 Oct 21 '19 at 16:33
1

There are many variations on a k-d tree for storing objects with extent, like the edges of your polygons. The approach with which I am most familiar (but have no link for) involves associating an axis-aligned bounding box with each node; the leaves correspond to the objects, and an internal node’s box is the smallest enclosing both of its children’s (which in general overlap). The usual median-cut approach is applied to the midpoints of the object’s boxes (for line segments, this is their midpoint).

Having built these for each polygon, the following dual recursion finds the closest approach:

def closest(k1,k2,true_dist):
  return _closest(k1,0,k2,0,true_dist,float("inf"))

def _closest(k1,i1,k2,i2,true_dist,lim):
  b1=k1.bbox[i1]
  b2=k2.bbox[i2]
  # Call leaves their own single children:
  cc1=k1.child[i1] or (i1,)
  cc2=k2.child[i2] or (i2,)
  if len(cc1)==1 and len(cc2)==1:
    return min(lim,true_dist(i1,i2))
  # Consider 2 or 4 pairs of children, possibly-closest first:
  for md,c1,c2 in sorted((min_dist(k1.bbox[c1],k2.bbox[c2]),c1,c2)
                         for c1 in cc1 for c2 in cc2):
    if md>=lim: break
    lim=min(lim,_closest(k1,c1,k2,c2,true_dist,lim)
  return lim

Notes:

  • The closest approach true_dist between two non-intersecting line segments must involve at least one endpoint.
  • The distance between a point and a segment can be greater than that between the point and the line containing the segment.
  • No point-point checks are needed: such a pair will be found (four times) via the adjacent edges.
  • The bounding-box arguments to min_dist may be overlapping, in which case it must return 0.
Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • thanks for this, will try to understand/ implement tomorrow – bm13563 Oct 20 '19 at 18:27
  • hi @Davis Herring, i've been trying to implement a kd-tree into my solution this morning. i've tried to follow through your dual recursion to help, but i have a couple of questions. 1) is the kd-tree built on just the 'target' polygon and then queried by each point in the other polygon, or are both polygons indexed? 2) kd-trees appear to be for points only - you mention that the midpoint of line segments can be used as the nodes. this suggests to me that the kd-tree is built on the midpoints, but i dont think i can be understanding right as this could lead to erroneous results in some cases? – bm13563 Oct 22 '19 at 09:44
  • @bm13563: Each polygon gets a tree. Some queries (like “minimum distance from polygon to point”) are answered by recursively analyzing one tree; this algorithm recurses on both simultaneously. The midpoints are used to help partition the (bounding boxes for the) edges, but the bounding boxes are what is stored in the tree (of this **special** design!) and they guarantee a complete analysis. – Davis Herring Oct 22 '19 at 12:58
  • thanks for your help. unfortunately i really can't get to grips with building the kdtree - i tried adapting this implementation to accept both points of each segment https://github.com/Vectorized/Python-KD-Tree/blob/master/kdtree.py, but have just ended up with a horrible mess of lists. i think i'm going to need to come back to this answer when i have a better grasp on recursion – bm13563 Oct 22 '19 at 15:15
  • apologies for continually coming back to you. i've written my own scripts for building/ querying kd trees for points, that are simpler than the one linked above, and this has helped me to better understand both kd trees and recursion. now that i've got points down, i've come back to line segments. are the nodes built solely on the edge midpoints, or are both endpoints also used to partition the tree? i struggle to see how you could have reference to the endpoints (which are presumably needed for the distance calculation) if only the midpoints are used to partition the tree – bm13563 Oct 25 '19 at 16:01
  • @bm13563: Each leaf pertains to a single segment and “contains” it as well as its bounding box. (Quotes because the segment might just be an array index and the bounding box (for just two points) might be computed as needed.) Internal nodes have only (their children and) a bounding box (which is certainly stored). The midpoints of the segments are used only during tree construction to guide each split (and might be computed just like their bounding boxes). – Davis Herring Oct 25 '19 at 18:20
  • brilliant! i've built a kd tree that stores the edges of the polygons, using an index like you suggested. i can now query this to correctly get the distance from a point to a polygon using the tree. i'm now trying to tackle the idea of dual recursion outlined above i.e the closest point in a set of points to the polygon (seems to be called the bichromatic pair problem?). in your example, what do i1 and i2 represent? they look like indexes - do they represent the edges of the polygons, in which case does this recursion need to be performed on all of the edges in one of the polygons? – bm13563 Oct 28 '19 at 11:58
  • @bm13563: Make sure you have one tree per polygon (it’s not clear from the above if you do), and that it’s segment-segment distances you’re computing. Then `i1` and `i2` are just indices (or some other sort of references) into each tree that start at the root (“0” above); if you had to do every segment pair, that would just be brute force. – Davis Herring Oct 28 '19 at 13:35
  • thanks, yes i was clear on that - because my knowledge on the topic is limited i've been trying to tackle each stage from first principles, i find it easier to start with points and work up to segments. i have two further questions (sorry): 1) what is the difference between your min_dist and true_dist functions? 2) what is the "or (i1,)" doing after setting cc1 - i assume this is something to do with switching the axis of the tree? – bm13563 Oct 28 '19 at 14:32
  • @bm13563: See clarifying edit. No axis-switching is mentioned here; the search doesn’t care *how* the tree is partitioned, though its performance depends on how *well* it is. – Davis Herring Oct 28 '19 at 14:43
  • and what initial value do you pass through as true_dist? – bm13563 Oct 28 '19 at 17:50
  • @bm13563: `true_dist` is a *function*, so that the same *k*-*d* tree code can handle any kind of object. (`min_dist` is a function only of bounding boxes, so it need not be so generic.) – Davis Herring Oct 29 '19 at 02:04
  • i've finally got it, thanks. need to do some final checks and write up the whole pipeline, will update my answer when i've done so. – bm13563 Oct 29 '19 at 10:24
  • haven't been able to consistently get correct results, despite many attempts. think i understand your answer, just struggling to implement it, so have marked your answer as correct and opened a new question here: https://stackoverflow.com/questions/58616886/dual-recursion-across-kd-trees-to-find-the-closest-approach-between-two-sets-of – bm13563 Oct 29 '19 at 23:48