2

This is my trivial test table,

create table test (
  id          int not null generated always as identity,
  first_name. varchar,

  primary key (id),
  unique(first_name)
);

As an alternative to insert-into-on-conflict sentences, I was trying to use the coalesce laziness to execute a select whenever possible or an insert, only when select fails to find a row.

coalesce laziness is described in documentation. See https://www.postgresql.org/docs/current/functions-conditional.html

Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.

I also want to get back the id value of the row, having being inserted or not.

I started with:

select coalesce (
  (select id from test where first_name='carlos'),
  (insert into test(first_name) values('carlos') returning id)
);

but an error syntax error at or near "into" was found. See it on this other DBFiddle https://www.db-fiddle.com/f/t7TVkoLTtWU17iaTAbEhDe/0

Then I tried:

select coalesce (
  (select id from test where first_name='carlos'),
  (with r as (
    insert into test(first_name) values('carlos') returning id
   ) select id from r
  )
);

Here I am getting a WITH clause containing a data-modifying statement must be at the top level error that I don't understand, as insert is the first and only sentence within the with.

I am testing this with DBFiddle and PostgreSQL 13. The source code can be found at https://www.db-fiddle.com/f/hp8T1iQ8eS4wozDCBhBXDw/5

coterobarros
  • 941
  • 1
  • 16
  • 25
  • 2
    "*I expected the `insert into` was only fired when the first query returned `null`*" - no, that's not how sql works. It might run the second query at any time, and to get predictable results it asks you to put the `insert` at the top level. What you actually want [is an upsert](https://stackoverflow.com/q/4069718/1048572). – Bergi Aug 29 '21 at 10:58
  • Ok, That's enough to abandon the `coalesce` option. As I mention I am aware of `insert-into-on-conflict` and also `insert-select-where-not-exists` but, I need also to know the row id of the operation (select or insert), as the code is part of a sql function. – coterobarros Aug 29 '21 at 11:02
  • The reason I was trying to leave `insert-into-on-onflict` is I am forced to make an update to get the row id of an existing row. – coterobarros Aug 29 '21 at 11:04
  • 1
    Sorry to go again over the `coalesce` thing. This is what documentation says about its lazyness: *Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.* Source: https://www.postgresql.org/docs/current/functions-conditional.html – coterobarros Aug 29 '21 at 11:08
  • It seems that my initial approach was right. – coterobarros Aug 29 '21 at 11:09
  • 1
    Ah, you don't want a plain upsert but [a insert-or-select](https://stackoverflow.com/a/42217872/1048572). I fear there is no simple solution to that. I've wished for native support of such functionality myself. – Bergi Aug 29 '21 at 11:38
  • Yes, all the mentioned reasons not to use `on-conflict` I read in your link are perfectly valid in my case, but the most important is to avoid an extra `update` on the row. I am coding a data warehouse ingestion process and the ratio of select:insert on the same row varies from 100-3000... You don't want to make 3000 updates just to get the id of the row you did not insert. – coterobarros Aug 29 '21 at 11:44
  • I landed to a solution. see the last EDIT section of my post However I still don't know why the simple coalesce (select, insert) renders an error – coterobarros Aug 29 '21 at 11:45
  • I rephrased my post to be clearer for the benefit of future readers. Thank you anyway. – coterobarros Aug 29 '21 at 12:01

3 Answers3

2

Different method: chained CTEs:


CREATE TABLE test
        ( id          INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY
        , first_name VARCHAR UNIQUE
        );

WITH sel AS (
        SELECT id FROM test WHERE first_name = 'carlos'
        )
, ins AS (
        INSERT INTO test(first_name) 
        SELECT 'carlos'
        WHERE NOT EXISTS (SELECT 1 FROM test WHERE first_name = 'carlos')
        RETURNING id
        )
, omg AS (
        SELECT id FROM sel
        UNION ALL 
        SELECT id FROM ins
        )
SELECT id
FROM omg
        ;
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • If I am not wrong, this executes the `insert into` clause always, which was what I was trying to avoid using the lazy coalesce evaluation process. Right? – coterobarros Aug 29 '21 at 11:23
  • This is what documentation says about its laziness: *Like a CASE expression, COALESCE only evaluates the arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other database systems.* Source: postgresql.org/docs/current/functions-conditional.html – coterobarros Aug 29 '21 at 11:25
  • Maybe you failed to see the `WHERE NOT EXISTS (SELECT 1 FROM test ...)` part? The insert **is** executed, but it is allowed to insert zero rows. – wildplasser Aug 29 '21 at 21:03
  • ah ok, let me see. – coterobarros Aug 29 '21 at 21:07
  • yes, it works like a charm, in only one sentence! – coterobarros Aug 29 '21 at 21:13
0

This is a refinement on @wildplasser accepted answer. it avoids comparing first_name twice and uses coalesce instead of union all. Kind of an selsert in just one sentence.

with sel as (
  select id from test where first_name = 'carlos'
)
, ins as (
  insert into test(first_name) 
  select 'carlos'
  where (select id from sel) is null
  returning id
)
select coalesce (
  (select id from sel),
  (select id from ins)
);

See it at https://www.db-fiddle.com/f/goRh4TyAebTkEZFHk6WbtK/6

coterobarros
  • 941
  • 1
  • 16
  • 25
  • It is not a refinement. It is just you insisting on using `coalesce()` for (possibly) non-scalars. – wildplasser Aug 29 '21 at 22:04
  • refinement comes, primarily, from avoiding a second comparison with `first_name`. The real case behind this example implies comparing URLs and even larger strings. – coterobarros Aug 29 '21 at 22:06
  • As I mention somewhere, this is part of a data warehouse ingestion system, and every select counts, as the 'selsert' will be run hundreds of thousands of times per session. – coterobarros Aug 29 '21 at 22:13
  • Please learn some SQL, before boasting about `hundreds of thousands`. There is a slight possibility that the people who answer your question have more knowledge and/or experience than you. (for batch-inserts the ideal solution could be different for 100K records, for instance) – wildplasser Aug 29 '21 at 22:30
0

It seems that the returning value from the insert into clause is not equivalent in nature to the scalar query of a select clause. So I try encapsulating the insert into into an SQL function and it worked.

create or replace function insert_first_name(
  _first_name varchar
) returns int
language sql as $$
  insert into test (first_name) 
  values (_first_name) 
  returning id;
$$;

select coalesce (
  (select id from test where first_name='carlos'),
  (select insert_first_name('carlos'))
);

See it on https://www.db-fiddle.com/f/73rVXgqGfrG4VmjrAk6Z3i/2

coterobarros
  • 941
  • 1
  • 16
  • 25