12

Writable CTEs to emulate an UPSERT in PostgreSQL seem quite elegant. (Until we get actual upsert/merge in Postgres. See: https://stackoverflow.com/a/8702291/558819)

Still working with PostgreSQL 9.3, there is one problem: how can I insert the default value? Using NULL won't help of course as NULL gets explicitly inserted as NULL, unlike for example with MySQL. An example:

WITH new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
    VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
    ,      (NULL::int, 21, 1, 'b', 34, 2, NULL::boolean)
    ,      (668, 21, 30012, 'c', 30, 3, FALSE)
    ,      (7428, 21, 23068, 'd', 0, 4, FALSE)
), upsert AS (
    UPDATE playlist_items m
    SET    (playlist, item, group_name, duration, sort, legacy)
       = (nv.playlist, nv.item, nv.group_name, nv.duration, nv.sort, nv.legacy)
    FROM   new_values nv
    WHERE  nv.id = m.id
    RETURNING m.id
)
INSERT INTO playlist_items (playlist, item, group_name, duration, sort, legacy)
SELECT playlist, item, group_name, duration, sort, legacy
FROM   new_values nv
WHERE  NOT EXISTS (SELECT 1
                   FROM   upsert m
                   WHERE  nv.id = m.id)
RETURNING id

So I'd like for example for the legacy column to take on its default value for the second VALUES row.

I've tried a few things, such as explicitly using DEFAULT in the VALUES list, which doesn't work because the CTE has no idea what it's inserting in. I've also tried coalesce(col, DEFAULT) in the insert statement which didn't seem to work either. So, is it possible to do what I want?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Aktau
  • 1,847
  • 21
  • 30
  • 2
    A really ugly hack would be to get the default values out of `pg_attrdef` and paste that into your new_values list with the `COALESCE` function. – Patrick May 22 '14 at 02:16
  • `So I'd like for example for the legacy column to take on its default value for the second VALUES row.` You need to define that more closely: Do you want default values for the INSERT, the UPDATE or both? Overwrite existing values with defaults in the UPDATE case or not? – Erwin Brandstetter May 22 '14 at 03:41
  • @ErwinBrandstetter to respond to your query: the default value is only necessary in the INSERT clause. The UPDATE clause does not need the default values. – Aktau May 22 '14 at 07:38

1 Answers1

20

Postgres 9.5 or newer

UPSERT is now implemented properly with INSERT ... ON CONFLICT ... DO NOTHING | UPDATE. This largely simplifies the operation:

INSERT INTO playlist_items AS m (id, playlist, item, group_name, duration, sort, legacy)
VALUES
  (651, 21, 30012, 'a', 30, 1, FALSE)
, (DEFAULT, 21, 1, 'b', 34, 2, DEFAULT)  -- !
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
ON     CONFLICT (id) DO UPDATE
SET   (         playlist,          item,          group_name,          duration,          sort,          legacy)
    = (EXCLUDED.playlist, EXCLUDED.item, EXCLUDED.group_name, EXCLUDED.duration, EXCLUDED.sort, EXCLUDED.legacy)
--    (...,  COALESCE(l.legacy, EXCLUDED.legacy))  -- alternative: see below
RETURNING m.id;

We can attach the VALUES clause to INSERT directly, which allows the DEFAULT keyword. In the case of unique violations on (id), Postgres updates instead. We can use excluded rows in the UPDATE. The manual:

The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table.

And:

Note that the effects of all per-row BEFORE INSERT triggers are reflected in excluded values, since those effects may have contributed to the row being excluded from insertion.

Remaining corner case

You have various options for the UPDATE: You can ...

  • ... not update at all: add a WHERE clause to the UPDATE to only write to selected rows.
  • ... only update selected columns.
  • ... only update if the column is currently NULL: COALESCE(l.legacy, EXCLUDED.legacy)
  • ... only update if the new value is NOT NULL: COALESCE(EXCLUDED.legacy, l.legacy)

But there is no way to discern DEFAULT values and values actually provided in the INSERT. Only resulting EXCLUDED rows are visible. If you need the distinction, fall back to the previous solution, where you have both at our disposal.

Postgres 9.4 or older

This is a tricky problem. You are running into this restriction (per documentation):

In a VALUES list appearing at the top level of an INSERT, an expression can be replaced by DEFAULT to indicate that the destination column's default value should be inserted. DEFAULT cannot be used when VALUES appears in other contexts.

Bold emphasis mine. Default values are not defined without a table to insert into. So there is no direct solution to your question, but there is a number of possible alternative routes, depending on exact requirements.

Fetch defaults values from system catalog?

You could fetch those from the system catalog pg_attrdef like @Patrick commented or from information_schema.columns. Complete instructions here:

But then you still only have a list of rows with a text representation of the expression to cook the default value. You would have to build and execute statements dynamically to get values to work with. Tedious and messy. Instead, we can let built-in Postgres functionality do that for us:

Simple shortcut

Insert a dummy row and have it returned to use generated defaults:

INSERT INTO playlist_items DEFAULT VALUES RETURNING *;

Problems / scope of the solution

This is only guaranteed to work for STABLE or IMMUTABLE default expressions. Most VOLATILE functions will work just as well, but there are no guarantees. The current_timestamp family of functions qualify as stable, since their values do not change within a transaction.

In particular, this has side effects on serial columns (or any other defaults drawing from a sequence). But that should not be a problem, because you don't normally write to serial columns directly. Those shouldn't be listed in INSERT statements at all.

Remaining flaw for serial columns: the sequence is still advanced by the single call to get a default row, producing a gap in the numbering. Again, that should not be a problem, because gaps are generally to be expected in serial columns.

Two more problems can be solved:

  • If you have columns defined NOT NULL, you have to insert dummy values and replace with NULL in the result.

  • We do not actually want to insert the dummy row. We could delete later (in the same transaction), but that may have more side effects, like triggers ON DELETE. There is a better way:

Avoid dummy row

Clone a temporary table including column defaults and insert into that:

BEGIN;
CREATE TEMP TABLE tmp_playlist_items (LIKE playlist_items INCLUDING DEFAULTS)
   ON COMMIT DROP;  -- drop at end of transaction

INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *;
...

Same result, fewer side effects. Since default expressions are copied verbatim, the clone draws from the same sequences if any. But other side effects from the unwanted row or triggers are avoided completely.

Credit to Igor for the idea:

Remove NOT NULL constraints

You would have to provide dummy values for NOT NULL columns, because (per documentation):

Not-null constraints are always copied to the new table.

Either accommodate for those in the INSERT statement or (better) eliminate the constraints:

ALTER TABLE tmp_playlist_items
   ALTER COLUMN foo DROP NOT NULL
 , ALTER COLUMN bar DROP NOT NULL;

There is a quick and dirty way with superuser privileges:

UPDATE pg_attribute
SET    attnotnull = FALSE
WHERE  attrelid = 'tmp_playlist_items'::regclass
AND    attnotnull
AND    attnum > 0;

It is just a temporary table with no data and no other purpose, and it's dropped at the end of the transaction. So the shortcut is tempting. Still, the basic rule is: never tamper with system catalogs directly.

So, let's look into a clean way: Automate with dynamic SQL in a DO statement. You just need the regular privileges you are guaranteed to have since the same role created the temp table.

DO
$do$
BEGIN
   EXECUTE (
   SELECT 'ALTER TABLE tmp_playlist_items ALTER '
       || string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
       || ' DROP NOT NULL'
   FROM   pg_catalog.pg_attribute
   WHERE  attrelid = 'tmp_playlist_items'::regclass
   AND    attnotnull
   AND    attnum > 0
   );
END
$do$;

Much cleaner and still very fast. Execute care with dynamic commands and be wary of SQL injection. This statement is safe. See:

General solution (9.4 and older)

BEGIN;

CREATE TEMP TABLE tmp_playlist_items
   (LIKE playlist_items INCLUDING DEFAULTS) ON COMMIT DROP;

DO $$BEGIN
EXECUTE (
   SELECT 'ALTER TABLE tmp_playlist_items ALTER '
       || string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
       || ' DROP NOT NULL'
   FROM   pg_catalog.pg_attribute
   WHERE  attrelid = 'tmp_playlist_items'::regclass
   AND    attnotnull
   AND    attnum > 0
   );
END$$;

LOCK TABLE playlist_items IN EXCLUSIVE MODE;  -- forbid concurrent writes

WITH default_row AS (
   INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *
   )
, new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
   VALUES
      (651, 21, 30012, 'a', 30, 1, FALSE)
    , (NULL, 21, 1, 'b', 34, 2, NULL)
    , (668, 21, 30012, 'c', 30, 3, FALSE)
    , (7428, 21, 23068, 'd', 0, 4, FALSE)
   )
, upsert AS (  -- *not* replacing existing values in UPDATE (?)
   UPDATE playlist_items m
   SET   (  playlist,   item,   group_name,   duration,   sort,   legacy)
       = (n.playlist, n.item, n.group_name, n.duration, n.sort, n.legacy)
   --                                   ..., COALESCE(n.legacy, m.legacy)  -- see below
   FROM   new_values n
   WHERE  n.id = m.id
   RETURNING m.id
   )
INSERT INTO playlist_items
        (playlist,   item,   group_name,   duration,   sort, legacy)
SELECT n.playlist, n.item, n.group_name, n.duration, n.sort
                                   , COALESCE(n.legacy, d.legacy)
FROM   new_values n, default_row d   -- single row can be cross-joined
WHERE  NOT EXISTS (SELECT 1 FROM upsert u WHERE u.id = n.id)
RETURNING id;
   
COMMIT;

You only need the LOCK if you have concurrent transactions trying to write to the same table.

As requested, this only replaces NULL values in the column legacy in the input rows for the INSERT case. Can easily be extended to work for other columns or in the UPDATE case as well. For instance, you could UPDATE conditionally as well: only if the input value is NOT NULL. I added a commented line to the UPDATE above.

Aside: You do not need to cast values in any row but the first in a VALUES expression, since types are derived from the first row.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    Brilliant answer, lots of useful bits (like only needing to cast the first row) as well. Very detailed. I'm going to test it out, report back and mark as answered if everything works out. The only thing I'm a bit concerned about is that the dummy row could have NOT NULL columns without a DEFAULT... for which I'd probably have to make up some temp values, no? – Aktau May 22 '14 at 06:33
  • 1
    Great question, genius answer. I have learnt a lot over my morning coffee. – John Powell May 22 '14 at 06:38
  • I've updated my answer with the things I encountered while testing your approach Erwin. I'd like to have your input on how I could proceed. To me, it's looking like I should just duplicate a/the default value on the client-side, ugly though it may be. – Aktau May 22 '14 at 07:44
  • @Aktau: I think I have found a fast and clean way to deal with NOT NULL constraints. Wanted to post earlier, but my internet died down. Consider the update. – Erwin Brandstetter May 22 '14 at 13:19
  • @ErwinBrandstetter: that is indeed a nice and generic solution, and it makes your already wonderful answer even more complete (thanks for that!). I don't let my apps run as a superuser though, so that's a bit out of the question (unless I misunderstood). – Aktau May 23 '14 at 08:29
  • @Aktau: More good news: you don't need additional privileges for the "clean way". Superuser privileges are only needed for the "quick and dirty" step, which I don't recommend anyway and is not in the final solution. I clarified that bit in the answer. You may want to edit the part in the Q that claims my solution wouldn't work .. :) – Erwin Brandstetter May 23 '14 at 13:12
  • @ErwinBrandstetter: first of, sorry if it seems like I implied that your solution doesn't work. Perhaps I worded it badly (unintentionally), but I did nuance it by saying that there were NOT NULL constraints (which I hadn't specified in my question). Anyway I apologize, I will promptly edit it. I will also test out your solution now that I know it doesn't require superuser privileges, which is great. Thanks for your expertise, I learned a lot about pg from it. – Aktau May 23 '14 at 22:02