1

I have a shapefile containing thousands of multipolygons, and dozensof different shapefiles. Following I am providing a small example/code with some toy insatnces (A multipolygons gpds and two points gpds).

import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Polygon

import random


# Create polygons
p1 = Polygon( [(0, 0), (0, 1), (0.5, 0.5)])
p2 = Polygon([(.75, 0), (1, .5), (.75, 1), (1.75, 0)])
p3 = Polygon([(2, 0), (2, 1), (3, 0)])
polygons = gpd.GeoDataFrame([p1, p2, p3], columns={'geometry'})
polygons['name'] = ["a", "b", "c"]


# Create red points
redPoints =  gpd.GeoDataFrame(gpd.points_from_xy( np.random.uniform(0, 3, 50), np.random.uniform(0, 1, 50)), columns = ['geometry'])

# Create green points
greenPoints =  gpd.GeoDataFrame(gpd.points_from_xy( np.random.uniform(1, 3, 50), np.random.uniform(0, 1, 50)), columns = ['geometry'])

# plot polygons, red and green points
# ax = polygons.plot()
# redPoints.plot(ax = ax, c = 'r')
# greenPoints.plot(ax = ax, c = 'g')



I need to count the number of each type of points that is inside/intersects with eahc of the multipolygons and create a column specific for each point type. I know how to do this with two shapefiles, like something as follow:


# count green (g) points
gpointsInPolygon = gpd.sjoin(polygons,greenPoints, op='intersects')
gpointsInPolygon['number of green points']=1
gpointsInPolygon = gpointsInPolygon.groupby(['name']).agg({'number of green points':sum}).reset_index()

cgps = gpointsInPolygon.set_index('name').join(polygons.set_index('name'), how ='right')
cgps.fillna(0, inplace=True)



# count red (r) points
rpointsInPolygon = gpd.sjoin(polygons,redPoints, op='intersects')
rpointsInPolygon['number of red points']=1
rpointsInPolygon = rpointsInPolygon.groupby(['name']).agg({'number of red points':sum}).reset_index()

crps = rpointsInPolygon.set_index('name').join(polygons.set_index('name'), how ='right')
crps.fillna(0, inplace=True)
#crs



redgreens = pd.merge(cgps, crps, on="geometry")

result = pd.merge(redgreens, polygons, on='geometry' )

I think this is not an efficient way of doing this (no?). As I have more than 50 large shapefiles of different type of points. I am looking for an efficient way to implement this for large point shapefiles. Thanks!

a11
  • 3,122
  • 4
  • 27
  • 66
Gerard
  • 15
  • 5
  • Downvoting because there are no reproducible, minimal dataframes provided. [This](https://stackoverflow.com/help/minimal-reproducible-example) and [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) are useful. – zabop Oct 02 '21 at 16:48
  • @Gerard what is it that's not working about your code? "it gets complicated" isn't enough for us to go on - as far as I can tell your code is fine. Can you be more specific about what's not working as you expect? – Michael Delgado Oct 02 '21 at 19:06
  • Thanks for the great edit! This is now a much easier to understand question. Unfortunately, my answer is that if you're doing a spatial join where you want the interseactions between many polygons and many sets of points, this is necessarily going to be a many-to-many spatial join which is just a pretty computationally expensive operation. geopandas.sjoin is the fastest way I know of to do this, so I don't think you're doing anything wrong. Best of luck! – Michael Delgado Oct 02 '21 at 23:13
  • @MichaelDelgado, Thank you! I see... I wonder if/how is there a way to implement a for loop to iterate the sjoin operation and update the result over the points shapefiles so that I don't repeat the code for each of them? – Gerard Oct 02 '21 at 23:35
  • oh - sorry - you asked about whether there is a faster way and I thought you were talking about performance. yeah you could definitely do this with a for loop – Michael Delgado Oct 02 '21 at 23:38
  • @MichaelDelgado, could you please provide more details on how to do it? I am a bit new to this gpd. – Gerard Oct 02 '21 at 23:41
  • but your gpd workflow is fine... put the point sets you want to iterate over into a list, then loop over the list elements. accumulate the results into another list. does that help? – Michael Delgado Oct 02 '21 at 23:45

1 Answers1

1

Can you combine all your points in one GeoDataFrame, have a name/category column ('red', 'blue', etc.), and then do the join just once? Something like this?

Your given data

import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Polygon
import random
import matplotlib.pyplot as plt

# Create polygons
p1 = Polygon([(0, 0), (0, 1), (0.5, 0.5)])
p2 = Polygon([(0.75, 0), (1, 0.5), (0.75, 1), (1.75, 0)])
p3 = Polygon([(2, 0), (2, 1), (3, 0)])
polygons = gpd.GeoDataFrame([p1, p2, p3], columns={"geometry"})
polygons["name"] = ["a", "b", "c"]

# Create red points
redPoints = gpd.GeoDataFrame(
    gpd.points_from_xy(np.random.uniform(0, 3, 50), np.random.uniform(0, 1, 50)),
    columns=["geometry"],
)

# Create green points
greenPoints = gpd.GeoDataFrame(
    gpd.points_from_xy(np.random.uniform(1, 3, 50), np.random.uniform(0, 1, 50)),
    columns=["geometry"],
)

Plot it to check

fig, ax = plt.subplots()
polygons.plot(ax=ax, edgecolor="blue", color="none")
greenPoints.plot(ax=ax, c="green")
redPoints.plot(ax=ax, c="red")

enter image description here

Combine into one GeoDataFrame

greenPoints["name"] = "green"
redPoints["name"] = "red"
allPoints = pd.concat([greenPoints, redPoints], axis=0)
display(allPoints)

enter image description here

Intersect / Join polygons and points

intersect = gpd.sjoin(polygons, allPoints)
display(intersect)

enter image description here

Counting (final answer)

mycounts = (
    intersect.groupby(["name_left", "name_right"])
    .size()
    .reset_index(name="counts")
    .rename(columns={"name_left": "polygon_id", "name_right": "point_id"})
)
display(mycounts)

enter image description here

Edit Based on OP's Comment:

Yes, you can use pd.merge() if you want to add the polygon geometry information. You will have to manage your column names carefully.

In the answer above, I renamed the polygon name column to polygon_id in mycounts = .... So if you just add on to my answer, you would use the following to get the polygon geometry column.:

mycounts2 = pd.merge(
    mycounts,
    polygons.copy().rename(columns={"name": "polygon_id"}),
    how="left",
    on="polygon_id",
)

The polygons.copy().rename(... part of this

  1. Makes a copy of the polygons GeoDataFrame (this is important to make sure you don't mutate the original GeoDataFrame!), and
  2. Renames the column.

Also note that you can't add the geometry of the points to mycounts because the mycounts DataFrame is the result of a groupby. If you need the geometry of the points, then you will need to do something like the following. Again, you will have to figure out the column name management for your dataset.

pts2 = pd.merge(
    allPoints,
    intersect.copy().rename(columns={"name_right": "name"}),
    how="left",
    on="name",
)

new_cols = {
    "geometry_x": "geom_pt",
    "name": "id_pt",
    "geometry_y": "geom_poly",
    "name_left": "id_poly",
    "index_right": "idx_pt",
}

pts2.rename(columns=new_cols, inplace=True)
a11
  • 3,122
  • 4
  • 27
  • 66
  • Yes, it is possible. Thanks for the reply! What if I want to have the geometry column in the mycounts? do I need peform a merge operation? or can it be added to the "Counting (final answer)" part? – Gerard Oct 03 '21 at 13:47
  • 1
    @Gerard yes, answer updated – a11 Oct 04 '21 at 16:12