11

I have a problem when I try to update many rows at the same time.

Here is the table and query I use (simplified for better reading):

table

CREATE TABLE foo
(
    pkid integer,
    x integer,
    y integer
)

query

UPDATE foo SET x=t.x, y=t.y FROM
(VALUES (50, 50, 1),
        (100, 120, 2))
AS t(x, y, pkid) WHERE foo.pkid=t.pkid

This query works perfectly, but when I try to execute a query where all x or y values are null, I get an error:

query with nulls

UPDATE foo SET x=t.x, y=t.y FROM
(VALUES (null, 20, 1),
        (null, 50, 2))
AS t(x, y, pkid) WHERE foo.pkid=t.pkid

error

ERROR:  column "x" is of type integer but expression is of type text
LINE 1: UPDATE foo SET x=t.x FROM

The only way to fix that is to change at least one of the values (null, 20, 1) to (null:int, 50, 2) but I can't do that, since I have a function which generates these "update multiple rows" query and it doesn't know anything about the column types.

What's the best solution here? Is there any better update query for multiple rows? Is there any function or syntax like AS t(x:gettype(foo.x), y:gettype(foo.y), pkid:gettype(foo.pkid))?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
AlbertFerras
  • 726
  • 5
  • 14

3 Answers3

19

With a standalone VALUES expression PostgreSQL has no idea what the data types should be. With simple numeric literals the system is happy to assume matching types. But with other input (like NULL) you would need to cast explicitly - as you already have found out.

You can query pg_catalog (fast, but PostgreSQL-specific) or the information_schema (slow, but standard SQL) to find out and prepare your statement with appropriate types.

Or you can use one of these simple "tricks" (I saved the best for last):

0. Select row with LIMIT 0, append rows with UNION ALL VALUES

UPDATE foo f
SET    x = t.x
     , y = t.y
FROM  (
  (SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
   UNION ALL
   VALUES
      (1, 20, NULL)  -- no type casts here
    , (2, 50, NULL)
   ) t               -- column names and types are already defined
WHERE  f.pkid = t.pkid;

The first sub-select of the subquery:

(SELECT x, y, pkid  FROM foo LIMIT 0)

gets names and types for the columns, but LIMIT 0 prevents it from adding an actual row. Subsequent rows are coerced to the now well-defined row type - and checked immediately whether they match the type. Should be a subtle additional improvement over your original form.

While providing values for all columns of the table this short syntax can be used for the first row:

(TABLE foo LIMIT 0)

Major limitation: Postgres casts the input literals of the free-standing VALUES expression to a "best-effort" type immediately. When it later tries to cast to the given types of the first SELECT, it may already be too late for some types if there is no registered assignment cast between the assumed type and the target type. Examples: text -> timestamp or text -> json.

Pro:

  • Minimum overhead.
  • Readable, simple and fast.
  • You only need to know relevant column names of the table.

Con:

  • Type resolution can fail for some types.

1. Select row with LIMIT 0, append rows with UNION ALL SELECT

UPDATE foo f
SET    x = t.x
     , y = t.y
FROM  (
  (SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
   UNION ALL SELECT 1, 20, NULL
   UNION ALL SELECT 2, 50, NULL
   ) t               -- column names and types are already defined
WHERE  f.pkid = t.pkid;

Pro:

  • Like 0., but avoids failing type resolution.

Con:

  • UNION ALL SELECT is slower than VALUES expression for long lists of rows, as you found in your test.
  • Verbose syntax per row.

2. VALUES expression with per-column type

...
FROM  (
   VALUES 
     ((SELECT pkid FROM foo LIMIT 0)
    , (SELECT x    FROM foo LIMIT 0)
    , (SELECT y    FROM foo LIMIT 0))  -- get type for each col individually
   , (1, 20, NULL)
   , (2, 50, NULL)
   ) t (pkid, x, y)  -- columns names not defined yet, only types.
...

Contrary to 0. this avoids premature type resolution.

The first row in the VALUES expression is a row of NULL values which defines the type for all subsequent rows. This leading noise row is filtered by WHERE f.pkid = t.pkid later, so it never sees the light of day. For other purposes you can eliminate the added first row with OFFSET 1 in a subquery.

Pro:

  • Typically faster than 1. (or even 0.)
  • Short syntax for tables with many columns and only few are relevant.
  • You only need to know relevant column names of the table.

Con:

  • Verbose syntax for only few rows
  • Less readable (IMO).

3. VALUES expression with row type

UPDATE foo f
SET x = (t.r).x         -- parenthesis needed to make syntax unambiguous
  , y = (t.r).y
FROM (
   VALUES
      ('(1,20,)'::foo)  -- columns need to be in default order of table
     ,('(2,50,)')       -- nothing after the last comma for NULL
   ) t (r)              -- column name for row type
WHERE  f.pkid = (t.r).pkid;

You obviously know the table name. If you also know the number of columns and their order you can work with this.

For every table in PostgreSQL a row type is registered automatically. If you match the number of columns in your expression, you can cast to the row type of the table ('(1,50,)'::foo) thereby assigning column types implicitly. Put nothing behind a comma to enter a NULL value. Add a comma for every irrelevant trailing column.
In the next step you can access individual columns with the demonstrated syntax. More about Field Selection in the manual.

Or you could add a row of NULL values and use uniform syntax for actual data:

...
  VALUES
      ((NULL::foo))  -- row of NULL values
    , ('(1,20,)')    -- uniform ROW value syntax for all
    , ('(2,50,)')
...

Pro:

  • Fastest (at least in my tests with few rows and columns).
  • Shortest syntax for few rows or tables where you need all columns.
  • You don't have to spell out columns of the table - all columns automatically have the matching name.

Con:

  • Not so well known syntax for field selection from record / row / composite type.
  • You need to know number and position of relevant columns in default order.

4. VALUES expression with decomposed row type

Like 3., but with decomposed rows in standard syntax:

UPDATE foo f
SET    x = t.x
     , y = t.y
FROM (
   VALUES
      (('(1,20,)'::foo).*)  -- decomposed row of values
    , (2, 50, NULL)
   ) t(pkid, x, y)  -- arbitrary column names (I made them match)
WHERE  f.pkid = t.pkid;     -- eliminates 1st row with NULL values

Or, with a leading row of NULL values again:

...
   VALUES
      ((NULL::foo).*)  -- row of NULL values
    , (1, 20, NULL)    -- uniform syntax for all
    , (2, 50, NULL)
...

Pros and cons like 3., but with more commonly known syntax.
And you need to spell out column names (if you need them).

5. VALUES expression with types fetched from row type

Like Unril commented, we can combine the virtues of 2. and 4. to provide only a subset of columns:

UPDATE foo f
SET   (  x,   y)
    = (t.x, t.y)  -- short notation, see below
FROM (
   VALUES
      ((NULL::foo).pkid, (NULL::foo).x, (NULL::foo).y)  -- subset of columns
    , (1, 20, NULL)
    , (2, 50, NULL)
   ) t(pkid, x, y)       -- arbitrary column names (I made them match)
WHERE  f.pkid = t.pkid;

Pros and cons like 4., but we can work with any subset of columns and don't have to know the full list.

Also displaying short syntax for the UPDATE itself that's convenient for cases with many columns. Related:

4. and 5. are my favorites.

db<>fiddle here - demonstrating all

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • It doesn't work: `update foo SET x=t.x, y=t.y FROM ( (SELECT x, y, pkid FROM foo LIMIT 0) UNION ALL VALUES (NULL,5, 1),(NULL,2, 2)) t WHERE foo.pkid=t.pkid` says `UNION types integer and text cannot be matched`. It has the same problem I posted in the question :( – AlbertFerras Sep 14 '12 at 15:34
  • It works, but the performance is very bad compared with what I had :( With your version, 250 updates with 2000 rows each: 32.30sec. With the version I had (adding y::integer casting as sugested by @jspboix ): 15.77sec. That's twice as slow :( It's a nice trick anyways ;) – AlbertFerras Sep 14 '12 at 16:10
  • 1
    @AlbertFerras: `UNION ALL SELECT` is slower with large numbers of rows. (I see form your other comment: you have thousands). I added two faster alternatives with pros and cons. Should be something fast for you in there now. – Erwin Brandstetter Sep 14 '12 at 23:24
  • The idea of union to an empty select from the table is very useful and I'm sure I will have plenty of opportunities to use it. It is one of those _"how didn't I ever though about that?"_ things. The second version, per column type, which you say is faster, adds value to it. – Clodoaldo Neto Sep 15 '12 at 11:04
  • 1
    @AlbertFerras If this is to run in a background batch then seconds or even minutes does not justify the more complex approach in the (very good) jspboix's suggestion. I think the KISS principle wins here. – Clodoaldo Neto Sep 15 '12 at 11:10
  • I like the 2nd one and will try it later. The 3rd one is not good for me since my batch scripts updates a subset of the table columns (and I don't know all of the column types), also I'd have to use some tools to write strings with special characters+escape them which could cause difficulties (I have bytea, geometry column types.. how do you write that?). Looks good, but generating this '(1,23,)'::foo string looks bad for me when you have non-trivial types. I will make some tests with your 2nd version when I have some time :) – AlbertFerras Sep 17 '12 at 11:05
  • @AlbertFerras: To be clear, you don't need to know *any* types and only names of relevant columns for the third solution. What you need to know is the number of columns and the position of relevant columns in default order. If you wonder about input syntax for a particular column, follow my advice in the answer and look at the output of `SELECT f from foo f LIMIT 10`. – Erwin Brandstetter Sep 17 '12 at 11:12
  • Well I still can't use it because I don't know the "position of relevant columns in default order" without making a query. If I have to make a new query I'd prefer @jspboix answer since its simpler. – AlbertFerras Sep 17 '12 at 11:18
  • 1
    2nd version works perfect, with the same speed as the one I had :) Since I don't need to make a new query to get the column types (and cache them in my script etc..) this will be the accepted answer. If someday I need something even faster I will try the 3rd version you've proposed. Thank you :) – AlbertFerras Sep 17 '12 at 12:18
  • 1
    You can also use `values ((null::test.city).id, (null::test.city).name), (2, 'City name')` syntax instead of `(select id from test.city limit 0)` sub queries. – Unril May 24 '18 at 11:39
  • @Unril: I like that variant and added an option ***5.*** above. – Erwin Brandstetter May 24 '18 at 13:12
2

If you have a script generating the query you could extract and cache the data type of each column an create the type cast accordingly. E.g:

SELECT column_name,data_type,udt_name 
FROM information_schema.columns 
WHERE table_name = 'foo';

From this udt_name you'll get the necessary cast as you explained in the last paragraph. Additionally you could do this:

UPDATE foo
SET x = t.x
FROM (VALUES(null::int4,756),(null::int4,6300))
AS t(x,pkid)
WHERE foo.pkid = t.pkid;
jspboix
  • 766
  • 1
  • 4
  • 8
  • I like that solution since I only need to make 1 query for all the update multiple rows. For now I accept this if nothing better comes up :) – AlbertFerras Sep 14 '12 at 15:14
0

Your script will create a temporary table from foo. It will have the same data types as foo. Use an impossible condition so it is empty:

select x, y, pkid
into temp t
from foo
where pkid = -1

Make your script to insert into it:

insert into t (x, y, pkid) values
(null, 20, 1),
(null, 50, 2)

Now update from it:

update foo 
set x=t.x, y=t.y 
from t
where foo.pkid=t.pkid

Finally drop it:

drop table t
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • If I run the update multiple rows once it would be good, but I run like 1000 queries with ~2000 rows in each one. I don't like the idea of creating and dropping a table for each of these 1000 queries. Also, I don't want to have a `prepare_update(), .. updates ..., end_update()` because it will make the code more complex and I'd need to change a lot of my previous code too. – AlbertFerras Sep 14 '12 at 15:18