4

I have two arrays of different length. One contains shapely polygons, the other contains shapely points. I want to run the a_polygon.contains(a_point) shapely function for every possible combination of elements in both arrays.

I was looking at this post as building a two column matrix with all the possible combinations in the rows could be a desirable intermediate step. But the loop in the 'cartersian(arrays)' function might hinder performance when the input data is huge.

I tried to broadcast one of the arrays, and then applying the shapely function:

Polygons_array[:,newaxis].contains(Points_array)

but that, off course, does not work. I am aware of the recently released geopandas library, but it is not an option for my Canopy installation.

Community
  • 1
  • 1
Jaqo
  • 329
  • 1
  • 5
  • 15
  • 2
    In my understanding using Numpy arrays is only fast when they contain basic types like for example `bool` or `double`, not general Python objects like the ones from the Shapely package. –  Jul 24 '14 at 13:28
  • This is good to know. Can somebody else elaborate on this please? By the way, below I posted a way to do the whole process. – Jaqo Jul 24 '14 at 17:06

1 Answers1

5

The following code shows how to apply a function on geometric objects contained in two arrays of different length. This approach avoids using loops. Pandas' apply and Numpy's .vectorize and broadcasting options are required.

First consider doing some imports and the two following arrays:

import numpy as np
import pandas as pd
from shapely.geometry import Polygon, Point

polygons = [[(1,1),(4,3),(4,1),(1,1)],[(2,4),(2,6),(4,6),(4,4),(2,4)],[(8,1),(5,1),(5,4),(8,1)]]
points = [(3,5),(7,3),(7,6),(3,2)]

An array containing geometric objects for polygons and points can be obtained following this:

geo_polygons = pd.DataFrame({'single_column':polygons}).single_column.apply(lambda x: Polygon(x)).values
geo_points = pd.DataFrame({'single_column':points}).single_column.apply(lambda x: Point(x[0], x[1])).values
# As you might noticed, the arrays have different length.

Now the function to be applied on both arrays is defined and vectorized:

def contains(a_polygon, a_point):
    return a_polygon.contains(a_point)
contains_vectorized = np.vectorize(contains)

In that way, the function is prepared to be applied to every element in a vector. Broadcasting the points array handles the pairwise evaluation:

contains_vectorized(geo_polygons, geo_points[:,np.newaxis])

Which returns the following array:

array([[False,  True, False],
   [False, False, False],
   [False, False, False],
   [ True, False, False]], dtype=bool)

The columns correspond to the polygons and the rows to the points. The boolean values in that array show, e.g., that the first point is within the second polygon. Which is OK. Mapping the polygons and the points will prove that right:

from descartes import PolygonPatch
import matplotlib.pyplot as plt
fig = plt.figure(1, figsize = [10,10], dpi = 300)
ax = fig.add_subplot(111)
offset_x = lambda xy: (xy[0] + 0.1, xy[1])
offset_y = lambda xy: (xy[0], xy[1] - 0.5)
for i,j in enumerate(geo_polygons):
    ax.add_patch(PolygonPatch(j, alpha=0.5))
    plt.annotate('polygon {}'.format(i + 1), xy= offset_y(tuple(j.centroid.coords[0])))
for i,j in enumerate(geo_points):
    ax.add_patch(PolygonPatch(j.buffer(0.07),fc='orange',ec='black'))
    plt.annotate('point {}'.format(i + 1), xy= offset_x(tuple(j.coords[0])))
ax.set_xlim(0, 9)
ax.set_ylim(0, 7)
ax.set_aspect(1)
plt.show()

Mapped geometries

Jaqo
  • 329
  • 1
  • 5
  • 15
  • 1
    I wanted to add that in the case of large data sets, it might be preferable to use a spatial index. This saves computations by avoiding full intersection test between geometries that are far from each other. See the [RTree library](http://toblerity.org/rtree/), for example. – Jaqo Jul 09 '15 at 07:31