1

How to efficiently assign a point a polygon's column value when there are many points?

This is an extention (many to many version) of the question found here: Check if a geopoint with latitude and longitude is within a shapefile

I have a set of points (~184 million x 2d numpy array) and a large shapefile with maybe 100k polygons (they don't overlap).

In R there is an %over% operator such that points %over% polygons returns an array of polygons of length points where the polygon is either the one the point is within or NaN. R doesn't scale, eats RAM and is slow so isn't an option.

The problem is that the solution in the related question mentioned is slow (1000 points takes 20 seconds). And if instead of changing the single point location that goes in SetSpatialFilter, I fill a wkbMultiPoint with points first, it takes only 13 minutes for 20 million points (most are outside polygons). The speed is fine but I have no way of knowing which feature belongs to which point. I put 20 million points into the multipoint but only get 1608 features, I need the features to equal the length of input or at least be relatable via index.

Doesn't have to be a geospatial calculation because the projection is in metres east and north.

  • The algorithm needed is basically arbitrary point rasterization. The work around is rasterizing the layer finer than required using gdal, then using a method for vector data instead (easy and fast). Crazy that rasterization takes 3 seconds and uses next to no ram! – user2432218 Jan 22 '21 at 00:54

0 Answers0