49

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:

SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'

Then, if no match:

INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id

The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:

INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id

... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.

EMP
  • 59,148
  • 53
  • 164
  • 220
  • 2
    It's sad that you had so say "there's a reason for this table" to have both a auto-incrementing key AND a unique key. If at all possible, one should ALWAYS have a unique and not null key if using auto-incrementing surrogate keys. Otherwise, there is *nothing* preventing duplicate real information. I'm not saying that surrogate keys don't have their uses, but the pervading idea that having both is somehow wrong is a data quality disaster waiting to happen. – Matthew Wood Jul 17 '11 at 18:38
  • 2
    Related answer dealing with possible race conditions: http://stackoverflow.com/questions/15939902/is-select-or-insert-in-a-function-prone-to-race-conditions/15950324#15950324 – Erwin Brandstetter Aug 29 '14 at 01:36

6 Answers6

52

Have you tried to union it?


Edit - this requires Postgres 9.1:

create table mytable (id serial primary key, other_key varchar not null unique);

WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';

results in:

 id | other_key 
----+-----------
  1 | SOMETHING
(1 row)
fiatjaf
  • 11,479
  • 5
  • 56
  • 72
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • I suspect it'll work in postgres 9.1. At the very worse, 9.1 will allow to write: `with new_row as (insert ... returning *) select * from new_row ... union select * from ...` – Denis de Bernardy Jul 17 '11 at 10:31
  • Interesting, +1. I'll keep this in mind for the future when we upgrade to 9.1. – EMP Jul 17 '11 at 12:24
  • @EMP: I agree, fascinating to see this ability; then again, I'm looking forward to writable CTEs in general – vol7ron Nov 30 '12 at 17:16
  • You can also make `SELECT * FROM mytable WHERE other_key = 'SOMETHING'` a CTE since you use it twice. – ma11hew28 Apr 30 '14 at 12:42
  • @MattDiPasquale I've recently seen that, and the example I saw used a `UNION ALL`, which I'm not quite sure why, other than I think it yields performance. `UNION` and `UNION ALL` should both only return one record. – vol7ron Aug 19 '14 at 15:47
  • How about performance? This approach requires two index lookups (one for INSERT, another for SELECT). – greatvovan Oct 25 '19 at 17:53
  • @vol7ron, UNION ALL does not eliminate duplicates, in case INSERT worked, you will get two rows, unless you use LIMIT 1. – greatvovan Oct 25 '19 at 17:54
  • @greatvovan: there are better ways to do this now (see Aloyse's answer). – Denis de Bernardy Oct 25 '19 at 19:04
  • @DenisdeBernardy I saw it, but I doubt it is better. It does only one lookup (in theory), but UPDATE may require DB engine to move row tuple in the table which involves additional read-write IO as well as index update. It requires load test on big number of rows to find out what it better. – greatvovan Oct 26 '19 at 20:33
  • @greatvovan: You're most welcome to post your findings in an answer. I haven't touched Postgres in years, so I'm probably no longer the right person to ask how the performance differs between the old and the new approaches. I would point out, though, that intuitively Aloyee's approach can be improved with an in-memory table (such as an additional `with` statement), so that only has one index lookup occurs (to check whether the record exists). I'm not seeing any obvious good reason to store `existed` in a table. – Denis de Bernardy Oct 26 '19 at 21:15
2

using 9.5 i successfully tried this

  • based on Denis de Bernardy's answer
  • only 1 parameter
  • no union
  • no stored procedure
  • atomic, thus no concurrency problems (i think...)

The Query:

WITH neworexisting AS (
    INSERT INTO mytable(other_key) VALUES('hello 2') 
    ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
    RETURNING * 
)
SELECT * FROM neworexisting

first call:

id|other_key|created            |existed|
--|---------|-------------------|-------|
 6|hello 1  |2019-09-11 11:39:29|false  |

second call:

id|other_key|created            |existed|
--|---------|-------------------|-------|
 6|hello 1  |2019-09-11 11:39:29|true   |

First create your table ;-)

CREATE TABLE mytable (
    id serial NOT NULL,
    other_key text NOT NULL,
    created timestamptz NOT NULL DEFAULT now(),
    existed bool NOT NULL DEFAULT false,
    CONSTRAINT mytable_pk PRIMARY KEY (id),
    CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
Aloyse
  • 29
  • 2
2

No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).

Erik Funkenbusch
  • 92,674
  • 28
  • 195
  • 291
  • @GiovanniP - no, Denis's answer only works in 9.1, not 9.0 and in any event I read the question as wanting a special sql statement to do this. – Erik Funkenbusch Jul 30 '13 at 15:35
  • 1
    @TruongSinh - sometimes the answer to a yes or no question is no. They can't always be constructive. In this case, the answer is no in regards to version 9.0. So I fail to understand your motivation for downvoting an accurate answer. – Erik Funkenbusch Apr 19 '14 at 04:17
1

you can use a stored procedure

IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
 INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
Ilia Choly
  • 18,070
  • 14
  • 92
  • 160
0

I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:

create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
    SELECT id
    FROM mytable
    WHERE other_key = 'test'
    UNION
    SELECT NULL AS id
    ORDER BY id NULLS LAST
    LIMIT 1
), table_ins AS (
    INSERT INTO mytable (id, other_key)
    SELECT
        COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
               'test'
    FROM table_sel
    ON CONFLICT (id) DO NOTHING
    RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;

In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.

In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure that if id is already in mytable I don't insert anything.

At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.

This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.

Sotis
  • 176
  • 1
  • 9
0

Minor tweak a decade late to Denis's excellent answer:

-- Create the table with a unique constraint
CREATE TABLE mytable (
          id serial PRIMARY KEY
,  other_key varchar NOT NULL UNIQUE
);


WITH new_row AS (
  -- Only insert when we don't find anything, avoiding a table lock if
  -- possible.
  INSERT INTO mytable ( other_key )
       SELECT 'SOMETHING'
        WHERE NOT EXISTS (
                SELECT *
                  FROM mytable
                 WHERE other_key = 'SOMETHING'
              )
    RETURNING *
)
(
  -- This comes first in the UNION ALL since it'll almost certainly be
  -- in the query cache. Marginally slower for the insert case, but also
  -- marginally faster for the much more common read-only case.
  SELECT *
    FROM mytable
   WHERE other_key = 'SOMETHING'

  -- Don't check for duplicates to be removed
  UNION ALL

  -- If we reach this point in iteration, we needed to do the INSERT and
  -- lock after all.
  SELECT *
    FROM new_row

) LIMIT 1 -- Just return whatever comes first in the results and allow
          -- the query engine to cut processing short for the INSERT
          -- calculation.
;

The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.

NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.

Miles Elam
  • 1,440
  • 11
  • 19