2

I have two shapefiles of a city. The first one is extremely detailed, down to the level of blocks, that has several information about each block, including the population density. The second one is the same city divided into a squared grid of 1.45km2 cells, with no other information.

I want to calculate the population density in each cell of the squared grid. I tried with

enriched=gpd.read_file('enriched.shp') #gdf with pop density info
grid=gpd.read_file('grid.shp') #grid gdf

popd=gpd.sjoin(grid[['cell_id','geometry']],enriched, op='intersects') #merge grid with enriched shp
popd=popd[['cell_id','popdens']].groupby(['cell_id']).sum().reset_index() #groupby cell and sum the densities of the blocks within
grid=grid.merge(popd,on='cell_id', how='left').fillna(0)

but I am not sure this is the proper way, since I am getting very high density values in some cells (like > 200k per km2). Is this right? how can I check if I am not missing anything?

EDIT: Here are the column headers of the two shapefiles

enriched.columns

Index(['REGION', 'PROVINCIA', 'COMUNA', 'COD_DISTRI', 'COD_ZONA', 'area', 'popdens', 'geometry'],
      dtype='object')

enriched.head(2)

    REGION  PROVINCIA   COMUNA  COD_DISTRI  COD_ZONA    area    popdens geometry
0      13       131    13121       2.0        1.0  0.442290  4589.75053 POLYGON ((-70.65571 -33.47856, -70.65575 -33.4...
1      13       131    13121       6.0        1.0   0.773985    7661.64421  POLYGON ((-70.68182 -33.47654, -70.68144 -33.4...

don't worry about the first 5 columns, you can see them as a primary key in the dataset: all together they uniquely identify a zone.

grid.columns
Index(['cell_id', 'geometry'], dtype='object')

grid.head(2)

    cell_id geometry
0   sq00024 POLYGON ((-70.79970 -33.50447, -70.78894 -33.5...
1   sq00025 POLYGON ((-70.79989 -33.51349, -70.78913 -33.5...
sato
  • 768
  • 1
  • 9
  • 30
  • 1
    Could you update your post with some lines of each shapefiles? – Paulo Marques Dec 17 '20 at 20:32
  • You can't sum population densities. You have to take an area-weighted average (I mean you _can_ sum them, but the calculation isn't valid) – Paul H Dec 18 '20 at 00:35
  • @PaulH uhm, you have a point. Could you elaborate your comment in an answer? I am editing the post to include the headers of the shapefiles, as requested by Paulo – sato Dec 18 '20 at 16:30
  • area-weighted average are pretty basic to calculate. Wikipedia should be a sufficient reference – Paul H Dec 18 '20 at 17:06
  • one of the top results on this site by search `[pandas] weighted average`: https://stackoverflow.com/questions/60122005/how-to-compute-weighted-average – Paul H Dec 18 '20 at 17:07
  • @sato Firstly the polygon shapefile should be clipped with grid shapefile then aggregate the polygon shapefile via grid shapefile by pop density. – BetterCallMe Dec 29 '20 at 10:33

0 Answers0