How can I look up a polygon that contains a point fast, given all polygons are a partition?

Question

I have a function

get_polygon(polygon_collection, point):
    for polygon in polygon_collection:
        if polygon.intersects(point):
            return polygon
    return None

This approach works, but it is in O(n) * O(single polygon check). This can for sure be reduced to O(log(n)) * O(single polygon check) if one builds a tree data structure.

Does shapely support that directly?

Structure of polyogon collection

The collection contains n MultiPolygons - hence one object can consist of many polygons.
Each MultiPolygon can have enclaves / exclaves
Typically (> 95%), they are simple polygons without holes
Typically (> 50%), the shape is rather simple (e.g. squares)

Use case example

The list of polygons could be postal code areas in Germany. That would be several thousands. Then I have GPS positions of me and some friends, that would be several thousands as well. I want to say in which area where we got most data points.

I don't think you're tackling the problem correctly (for your use case). Postal codes in Germany have a specific order, you should never check all postal codes there. Start with checking out the first digit so you can rule out ~9/10 already and never have to check those. Then narrow it down from there. Look at the postal codes as a tree. You would never iterate through all leaves if you can exclude a whole branch. — DonQuiKong, Nov 09 '18 at 11:09
For general polygons, pre-compute a sorted list of bounding boxes for the gps-achses and then figure out which few polygons the gps coordinate can be in for example. If your x-coordinate (or whatever the gps equivalent is) is 5 and your polygons are distributed from 0 to 100, only check those where the bounding box starts left from 5 and ends right from 5. That's iterating partly through 2 sorted lists. Same for y. — DonQuiKong, Nov 09 '18 at 11:11
@DonQuiKong The use case I described is way simpler than my actual use case. I already suggested that building a tree data structure is what I would like to do (in fact, I already wrote custom software for it). But I wonder if shapely has built-in support for building that tree and using it. — Martin Thoma, Nov 09 '18 at 12:09

eguaio · Answer 1 · 2018-11-22T23:29:51.613

RTrees are exactly what you're looking for. Shapely now supports the construction and use of RTrees. But the thing is that shapely rtrees only support rectangles. So, the recipe to use it can be as follows:

Build a rtree with minimum rectangles that contain the polygons.
For a given point, use the RTree to know what rectangles include the point. The point needs to be inflated a litle, since the rtree expects a box to check intersections. This will give you some false positives, but that's ok.
Filter the false positives to get the final result.

Unless your polygons are very strange, this will give you logarithmic time in the mean (logarithmic in the amount of polygons for each point). Of course, you need to pay the price of building the RTree structure, but that is also done in logarithmic time if the polygons are somehow equally distributed, and the rtree can be serialized for later use. (That is, adding a new box to the tree is logarithmic in the amount of boxes already present in the tree, so the total complexity less than n*log(n)).

An implementation of this is as follows.

from rtree import index
from shapely.geometry import Polygon, Point

def get_containing_box(p):
    xcoords = [x for x, _ in p.exterior.coords]
    ycoords = [y for _, y in p.exterior.coords]
    box = (min(xcoords), min(ycoords), max(xcoords), max(ycoords))   
    return box

def build_rtree(polys):
    def generate_items():
        pindx = 0
        for pol in polys:
            box = get_containing_box(pol)
            yield (pindx, box,  pol)
            pindx += 1
    return index.Index(generate_items())


def get_intersection_func(rtree_index):
    MIN_SIZE = 0.0001
    def intersection_func(point):
        # Inflate the point, since the RTree expects boxes:
        pbox = (point[0]-MIN_SIZE, point[1]-MIN_SIZE, 
                 point[0]+MIN_SIZE, point[1]+MIN_SIZE)
        hits = rtree_index.intersection(pbox, objects='raw')
        #Filter false positives:
        result = [pol for pol in hits if pol.intersects(Point(point)) ]
        return result
    return intersection_func

The example I used to test it:

triangles = [
 [(8.774239095627754, 32.342041683826224),
  (8.750148703126552, 32.899346596303054),
  (4.919576457288677, 35.41040289384488)],
 [(8.485955148136222, 32.115258769399446),
  (9.263360720698277, 30.065319757618354),
  (4.562638192761559, 38.541192819415855)],
 [(2.613649959824923, 38.14802347408093),
  (7.879211442172622, 35.97223726358858),
  (0.9726266770834235, 32.12523430143625)]
]

polys = [Polygon(t) for t in triangles]
my_rtree = build_rtree(polys)
my_func = get_intersection_func(my_rtree)


my_intersections =  my_func((5, 35))
for pol in my_intersections:
    print pol.exterior.coords[:]

yes! In fact, I should edit the answer to address that. Good catch! — eguaio, Jun 08 '20 at 18:46

How can I look up a polygon that contains a point fast, given all polygons are a partition?

Structure of polyogon collection

Use case example

1 Answers1