1

I have two tables with data. I need to set data building_h from one table to another with respect to the spatial relationships between data in these tables. Spatial data in the source table is polygon type and in destination table - points. I made a query for points and polygons which intersect. It works nice. But there are some points that do not intersect with polygons. My idea is to define for each of these points the closest polygon and take the building_h value from it. For this purpose, I wrote the function which takes the id of the point and returns the building_h value. And it works ok in the test.

CREATE OR REPLACE FUNCTION closest_pol(int4)
RETURNS NUMERIC 
AS
$$
DECLARE
    retVal NUMERIC;
BEGIN
    SELECT bgs.building_h INTO retVal
                FROM buildings_geoalert_spgg bgs, building_from_landuse_spgg bfl 
                WHERE ST_INTERSECTS(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) AND bfl.id = $1
                ORDER BY ST_INTERSECTION(bgs.geom, ST_BUFFER(bfl.geom_centr, 0.0006,'quad_segs=8')) ASC 
                LIMIT 1;
RETURN retVal;
END;
$$
LANGUAGE plpgsql 
   STABLE 
RETURNS NULL ON NULL INPUT;

But when I run the query for the whole table, it executes forever.. (for ~3000 rows with NULL value). It lasted several hours before I stopped it.

UPDATE building_from_landuse_spgg AS bfl SET 
    building_h = (SELECT closest_pol(bfl.id))
    WHERE bfl.building_h IS NULL;

Do you have an idea what I did wrong?

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
gregoriiv
  • 13
  • 4

1 Answers1

1

In your WHERE clause you're using ST_Intersects with 0.006 buffer that is created in query time. Consider using a partial gist index with this buffer of yours (if for whatever reason you must use them):

CREATE INDEX idx_buffer ON building_from_landuse_spgg USING gist (ST_Buffer(geom_centr, 0.0006,'quad_segs=8'));

I'd argue that you do not need a function for that, since you could use the query inside of the function as a subquery in your UPDATE statement. But if you have your reasons to stick to the function, you can get rid of the SELECT to call it:

UPDATE building_from_landuse_spgg 
SET building_h = closest_pol(id)
WHERE building_h IS NULL;

EDIT: As correctly mentioned by @JGH (see comments), using buffers is less efficient than simply using ST_DWithin. So, if you can afford it, create an index in the geometries used in the WHERE clause ..

CREATE INDEX idx_landuse_geom_centr ON building_from_landuse_spgg USING gist (geom_centr);
CREATE INDEX idx_geoalert_geom ON buildings_geoalert_spgg USING gist (geom);

.. and also partially indexing building_h would speed up things a bit, since you're only interested in the NULL records:

CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg (building_h) 
WHERE building_h IS NULL;

Or if you prefer a broader index but still putting NULL values in the first class ..

CREATE INDEX idx_landuse_building_h ON building_from_landuse_spgg 
  (building_h NULLS FIRST);

And perhaps consider putting the code of your function in a subquery, e.g.

UPDATE building_from_landuse_spgg AS bfl 
SET building_h = (
  SELECT bgs.building_h 
  FROM   buildings_geoalert_spgg bgs 
  WHERE  ST_DWithin(bgs.geom, bfl.geom_centr, 0.0006)
  ORDER BY ST_Distance(bgs.geom, bfl.geom_centr) ASC LIMIT 1)
WHERE bfl.building_h IS NULL;

Further reading:

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
  • 1
    using `st_dwithin` would be much more efficient and would lead to more accurate results, as a buffer is an approximation only. – JGH Mar 18 '21 at 13:02
  • @JGH I totally agree. I just sticked to the function used in the OP, but you're right.. st-dwithin is a much better choice :) cheers – Jim Jones Mar 18 '21 at 14:11