Is there a way to set multiple column values on a single spatial query?

Question

I am associating attributes with points using PostGIS's ST_DWIthin().

UPDATE scratch.intersections AS i
SET legs = (
    SELECT COUNT(r.geom)
    FROM received.streets r
    WHERE ST_DWithin(i.geom, r.geom, 2));

UPDATE scratch.intersections AS i
SET streets = (
    SELECT ARRAY_AGG(DISTINCT r.NAME ORDER BY r.NAME) 
    filter (WHERE r.NAME IS NOT NULL)
    FROM received.streets r
    WHERE ST_DWithin(i.geom, r.geom, 2));

Seems like it should be possible to update multiple columns with a single spatial query, but I can't think of a way to structure it since I can only update a single column at a time.

Is there a way to turn these two queries into a single query, requiring only one spatial calculation?

Would it be more efficient to do an INNER JOIN creating a new temp table with a record for each line within 2 of a point, and then set values off of that table? Describing it, it sounds less efficient since performance of ST_DWithin() isn't terrible using indexes.

Does the count go into `streets` or the array of names? How often is this `UPDATE` going to run? For the whole table? I might have more advice. — Erwin Brandstetter, May 01 '20 at 22:30
didn't even post the right code snippet sorry, updated. Its just setting legs to equal the count of streets going into the intersection and then building an array of the street names. I run queries like this pretty often but an indidivudal query only runs a couple times, always for the full table. — Hugh_Kelley, May 02 '20 at 16:14

score 2 · Accepted Answer · answered May 01 '20 at 20:36

2

You can update multiple columns in one statement by enclosing them in parentheses

UPDATE myTable
SET (a,b) = (
  select c,d 
  from anotherTable 
  WHERE st_dwithin(mytable.geom, anotherTable.geom,2)
);

answered May 01 '20 at 20:36

JGH

15,928
4
31
48

Erwin Brandstetter · Answer 2 · 2020-05-28T13:16:33.910

JGH provided a smart way to UPDATE multiple columns at once from a (correlated) subquery. Possible since Postgres 9.5. However, there are considerable downsides.

That form updates every single row in the table unconditionally. You would have to repeat the calculation in the WHERE clause to do it conditionally, which would defy the purpose of doing it once only.

In PostgreSQL's MVCC model UPDATE basically means writing a new version of the complete row, bloating table and indexes. Expensive by itself, but also causing a lot of additional work for VACUUM - and/or performance degradation if you can't or don't clean up. See:

Does updating a row with the same value actually update the row?

Either you really want to update every row, then that's basically fine. Though, if you are not bound by concurrent load or internal references to the table (views, FKs, ...), consider writing a new table instead. Typically cheaper overall.

However, some (or many/most?) rows may be up to date already. Then that's a huge waste. Consider instead:

UPDATE scratch.intersections AS i
SET    legs    = r.legs  
     , streets = r.streets
FROM   scratch.intersections x
JOIN   LATERAL (
   SELECT count(*) AS legs  -- assuming r.geom is NOT NULL
        , array_agg(DISTINCT s.NAME ORDER BY s.NAME)
                    FILTER (WHERE s.NAME IS NOT NULL) AS streets
   FROM   received.streets s
   WHERE  ST_DWithin(x.geom, s.geom, 2)
   ) r ON x.legs    IS DISTINCT FROM r.legs
       OR x.streets IS DISTINCT FROM r.streets 
WHERE i.id = x.id;  -- id being the PK (or at least UNIQUE)

This does not touch rows that don't actually change.

We need the additional instance of scratch.intersections in the FROM clause to allow the LATERAL join. But we eliminate all rows without changes before applying the update - this way saving most of the work for every row that's already up to date. Well, since the subquery is relatively expensive, maybe not most of the work. But typically, the actual write operation is much more expensive than calculating new values.

This is pretty close to what you had in mind in your last paragraph. But more efficient without creating a temp table, and you also save empty writes.

And if you only need to update possibly affected rows in intersections after changing few rows in streets.geom, there is more optimization potential, yet. Not going there, seems beyond the scope of this question.

Is there a way to set multiple column values on a single spatial query?

2 Answers2