Or: The search for a faster and more accurate way to rasterize small OpenStreetMap extracts into population-weighted rasters.
I'd like to turn a small .pbf file into a GeoTiff which will be easier to do further spatial analysis on. For the purpose of this question I will constrain the requirements to dealing with polygon geometry since I already found a solution that works very well for lines. It works so well that I am considering converting all my polygons into lines.
To give an example of the type of data that I'd like to convert:
wget https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf
osmium tags-filter liechtenstein-latest.osm.pbf landuse=grass -o liechtenstein_grass.pbf
ogr2ogr -t_srs EPSG:3857 liechtenstein_grass.sqlite -dsco SPATIALITE=YES -nln multipolygons -nlt POLYGON -skipfailures liechtenstein_grass.pbf
I found a zonal statistics script here which we might be able to build from to solve this problem: http://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#calculate-zonal-statistics
The above script takes a vector layer and a raster layer, iterates on the vector features by clipping the raster and doing some statistics on that.
Instead of normal zonal statistics I would like to count the number of vector features which intersect with each raster pixel. I have a global raster grid Int32 with a unique value for each pixel.
{qgis_process} run native:creategrid -- TYPE=2 EXTENT="-20037760, -8399416, 20037760, 18454624 [EPSG:3857]" HSPACING=1912 VSPACING=1912 HOVERLAY=0 VOVERLAY=0 CRS="EPSG:3857" OUTPUT="grid.gpkg"
sqlite3 land.gpkg
SELECT load_extension("mod_spatialite");
alter table output add column ogcfod int;
update output set ogcfod = fid;
gdal_rasterize -l output -a ogcfod -tap -tr 1912.0 1912.0 -a_nodata 0.0 -ot Int32 -of GTiff -co COMPRESS=DEFLATE -co PREDICTOR=2 grid.gpkg grid.tif -te -20037760 -8399416 20037760 18454624
So I'm thinking if we could still iterate on the vector features (as there are far, far fewer of those and there are 88m+ zones in the raster grid) it will probably be much more performant.
We want a script script which takes a vector layer and a raster layer, iterates on the vector features looks up the values of all the pixels the feature covers and then adds one to a dictionary: {px_id: qty}
However, when trying to make this script work it keeps giving me the same geometry... It only shows me 1 of the pixel IDs over and over
import sys
import gdal
import numpy
import ogr
import osr
from rich import inspect, print
def zonal_stats(feat, lyr, raster):
# Get raster georeference info
transform = raster.GetGeoTransform()
xOrigin = transform[0]
yOrigin = transform[3]
pixelWidth = transform[1]
pixelHeight = transform[5]
# Reproject vector geometry to same projection as raster
sourceSR = lyr.GetSpatialRef()
targetSR = osr.SpatialReference()
targetSR.ImportFromWkt(raster.GetProjectionRef())
coordTrans = osr.CoordinateTransformation(sourceSR, targetSR)
feat = lyr.GetNextFeature()
geom = feat.GetGeometryRef()
geom.Transform(coordTrans)
# Get extent of feat
geom = feat.GetGeometryRef()
if geom.GetGeometryName() == "MULTIPOLYGON":
count = 0
pointsX = []
pointsY = []
for polygon in geom:
geomInner = geom.GetGeometryRef(count)
ring = geomInner.GetGeometryRef(0)
numpoints = ring.GetPointCount()
for p in range(numpoints):
lon, lat, z = ring.GetPoint(p)
pointsX.append(lon)
pointsY.append(lat)
count += 1
elif geom.GetGeometryName() == "POLYGON":
ring = geom.GetGeometryRef(0)
numpoints = ring.GetPointCount()
pointsX = []
pointsY = []
for p in range(numpoints):
lon, lat, z = ring.GetPoint(p)
pointsX.append(lon)
pointsY.append(lat)
else:
sys.exit("ERROR: Geometry needs to be either Polygon or Multipolygon")
xmin = min(pointsX)
xmax = max(pointsX)
ymin = min(pointsY)
ymax = max(pointsY)
print(xmin, xmax)
print(ymin, ymax)
# Specify offset and rows and columns to read
xoff = int((xmin - xOrigin) / pixelWidth)
yoff = int((yOrigin - ymax) / pixelWidth)
xcount = int((xmax - xmin) / pixelWidth) + 1
ycount = int((ymax - ymin) / pixelWidth) + 1
# Create memory target raster
target_ds = gdal.GetDriverByName("MEM").Create(
"", xcount, ycount, 1, gdal.GDT_Int32
)
target_ds.SetGeoTransform(
(
xmin,
pixelWidth,
0,
ymax,
0,
pixelHeight,
)
)
# Create for target raster the same projection as for the value raster
raster_srs = osr.SpatialReference()
raster_srs.ImportFromWkt(raster.GetProjectionRef())
target_ds.SetProjection(raster_srs.ExportToWkt())
# Rasterize zone polygon to raster
gdal.RasterizeLayer(target_ds, [1], lyr, burn_values=[1])
# Read raster as arrays
banddataraster = raster.GetRasterBand(1)
dataraster = banddataraster.ReadAsArray(xoff, yoff, xcount, ycount)
bandmask = target_ds.GetRasterBand(1)
datamask = bandmask.ReadAsArray(0, 0, xcount, ycount)
print(dataraster)
# Mask zone of raster
# zoneraster = numpy.ma.masked_array(dataraster, numpy.logical_not(datamask))
# print(zoneraster)
# exit()
def loop_zonal_stats(lyr, raster):
featList = range(lyr.GetFeatureCount())
statDict = {}
for FID in featList:
feat = lyr.GetFeature(FID)
meanValue = zonal_stats(feat, lyr, raster)
statDict[FID] = meanValue
return statDict
def main(input_zonal_raster, input_value_polygon):
raster = gdal.Open(input_zonal_raster)
shp = ogr.Open(input_value_polygon)
lyr = shp.GetLayer()
return loop_zonal_stats(lyr, raster)
if __name__ == "__main__":
if len(sys.argv) != 3:
print(
"[ ERROR ] you must supply two arguments: input-zone-raster-name.tif input-value-shapefile-name.shp "
)
sys.exit(1)
main(sys.argv[1], sys.argv[2])
Prior research:
- https://gis.stackexchange.com/questions/177738/count-overlapping-polygons-including-duplicates
- https://stackoverflow.com/a/47443399/697964
- If gdal_rasterize could burn in the count of all the polygons which intersect with each pixel (rather than a fixed value) that would likely fulfill my needs.
- https://github.com/rory/osm-summary-heatmap/blob/main/Makefile
- https://old.reddit.com/r/gis/comments/4n2q5v/count_overlapping_polygons_qgis/
- heatmapkerneldensity does not work very well or maybe I'm not using it correctly but it seems off
{qgis_process} run qgis:heatmapkerneldensityestimation -- INPUT="{basen}.json.geojson" RADIUS=2868 RADIUS_FIELD=None PIXEL_SIZE=1912 WEIGHT_FIELD=None KERNEL=4 DECAY=0 OUTPUT_VALUE=0 OUTPUT="{basen}.tif