I have a DataFrame
containing the columns lat
and lng
. I also have FeatureCollection
geojson that contains a polygon. Given this polygon, how can I segment my df
and select only the rows that are within the given polygon in an efficient way? I want to avoid looping over the df
and checking each element manually.
d = {'lat' : [0,0.1,-0.1,0.4],
'lng' : [50,50.1,49.6,49.5]}
df = pd.DataFrame(d)
This is the feature collection that displays 1 polygon and the 4 points. As you can see, only the last point is outside.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
0,
49
],
[
0.6,
50
],
[
0.1,
52
],
[
-1,
51
],
[
0,
49
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
50
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.1,
50.1
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
-0.1,
49.6
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.4,
49.5
]
}
}
]
}
this map displays the polygon and points.
Edit: The following is the code that I have at the moment, but as you can expect, it is very slow.
from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
polygon = shape(feature['geometry'])
for index, row in dfr.iterrows():
point = Point(row.location_lng, row.location_lat)
if polygon.contains(point):
print('Found containing polygon:', feature)
where dfr
is my DataFrame
containing location_lat
and location_lng
. The feature_collection
is a geojson Feature Collection that has only polygons (note that the example of geojson above is just for explaining the question, it only has 1 polygon and has some points to ilustrate the question).