I have two shapefiles of a city. The first one is extremely detailed, down to the level of blocks, that has several information about each block, including the population density. The second one is the same city divided into a squared grid of 1.45km2 cells, with no other information.
I want to calculate the population density in each cell of the squared grid. I tried with
enriched=gpd.read_file('enriched.shp') #gdf with pop density info
grid=gpd.read_file('grid.shp') #grid gdf
popd=gpd.sjoin(grid[['cell_id','geometry']],enriched, op='intersects') #merge grid with enriched shp
popd=popd[['cell_id','popdens']].groupby(['cell_id']).sum().reset_index() #groupby cell and sum the densities of the blocks within
grid=grid.merge(popd,on='cell_id', how='left').fillna(0)
but I am not sure this is the proper way, since I am getting very high density values in some cells (like > 200k per km2). Is this right? how can I check if I am not missing anything?
EDIT: Here are the column headers of the two shapefiles
enriched.columns
Index(['REGION', 'PROVINCIA', 'COMUNA', 'COD_DISTRI', 'COD_ZONA', 'area', 'popdens', 'geometry'],
dtype='object')
enriched.head(2)
REGION PROVINCIA COMUNA COD_DISTRI COD_ZONA area popdens geometry
0 13 131 13121 2.0 1.0 0.442290 4589.75053 POLYGON ((-70.65571 -33.47856, -70.65575 -33.4...
1 13 131 13121 6.0 1.0 0.773985 7661.64421 POLYGON ((-70.68182 -33.47654, -70.68144 -33.4...
don't worry about the first 5 columns, you can see them as a primary key in the dataset: all together they uniquely identify a zone.
grid.columns
Index(['cell_id', 'geometry'], dtype='object')
grid.head(2)
cell_id geometry
0 sq00024 POLYGON ((-70.79970 -33.50447, -70.78894 -33.5...
1 sq00025 POLYGON ((-70.79989 -33.51349, -70.78913 -33.5...