1

I have a DataFrame containing the columns lat and lng. I also have FeatureCollection geojson that contains a polygon. Given this polygon, how can I segment my df and select only the rows that are within the given polygon in an efficient way? I want to avoid looping over the df and checking each element manually.

d = {'lat' : [0,0.1,-0.1,0.4],
    'lng' : [50,50.1,49.6,49.5]}


df = pd.DataFrame(d)

This is the feature collection that displays 1 polygon and the 4 points. As you can see, only the last point is outside.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              0,
              49
            ],
            [
              0.6,
              50
            ],
            [
              0.1,
              52
            ],
            [
              -1,
              51
            ],
            [
              0,
              49
            ]
          ]
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0,
          50
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0.1,
          50.1
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          -0.1,
          49.6
        ]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          0.4,
          49.5
        ]
      }
    }
  ]
}

this map displays the polygon and points.

Edit: The following is the code that I have at the moment, but as you can expect, it is very slow.

from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
    polygon = shape(feature['geometry'])
    for index, row in dfr.iterrows():
        point = Point(row.location_lng, row.location_lat)
        if polygon.contains(point):
            print('Found containing polygon:', feature)

where dfr is my DataFrame containing location_lat and location_lng. The feature_collection is a geojson Feature Collection that has only polygons (note that the example of geojson above is just for explaining the question, it only has 1 polygon and has some points to ilustrate the question).

otmezger
  • 10,410
  • 21
  • 64
  • 90
  • Is this link relevant?: https://stackoverflow.com/questions/36399381/whats-the-fastest-way-of-checking-if-a-point-is-inside-a-polygon-in-python – erncyp Nov 29 '18 at 18:00
  • thanks @erncyp that did not help me, as it uses matplotlib and I don't want to go that way. I would prefer doing it with a pandas like approach. – otmezger Nov 29 '18 at 19:37
  • Did you create the dataframe `df` from `feature_collection`? if yes how? and in your code you use `iterrows` on `dfr` not `df`, is it the same? – Ben.T Nov 29 '18 at 19:37
  • @Ben.T thanks for the questions. I'll try to clarify. The example above is just to explain the task. In general, I have a large dataframe (`dfr`) and a large feature collection, containing only polygons. I'll try to edit the question. – otmezger Nov 29 '18 at 19:40

1 Answers1

1

Assuming you have you dataframe dfr like:

   location_lat  location_lng
0           0.0          50.0
1           0.1          50.1
2          -0.1          49.6
3           0.4          49.5

and the feature_collection containing polygons such as:

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
      }
    },
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
      }
    }]
}

I change 49 to 50 in the second polygon to remove other points in it.

You can first create a column with the points in dfr:

#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)

#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))

The second method seems faster on a small dataframe so I would use this one even for bigger dataframe.

Now you can create a column for each polygon in feature_collection containing if the point belongs to the feature, I guess by looping on them:

from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
    dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))

then dfr looks like:

   location_lat  location_lng              point  feature_0  feature_1
0           0.0          50.0       POINT (0 50)       True      False
1           0.1          50.1   POINT (0.1 50.1)       True       True
2          -0.1          49.6  POINT (-0.1 49.6)       True      False
3           0.4          49.5   POINT (0.4 49.5)      False      False

To select which point belongs to a feature, then you do:

print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
   location_lat  location_lng
1           0.1          50.1
Ben.T
  • 29,160
  • 6
  • 32
  • 54