8

I wish to create a stored procedure (in plpgsql, PostgreSQL 9.1) that first checks to be sure that the record which is going to be inserted is unique on four of its columns, or if a record is updated, that it is updated to unique values.

  Example:
    Record (1,2,3,4) is to be inserted.
    If Record (1,2,3,4) already exists, then do not insert a duplicate record.
    if Record (1,2,3,4) does not exist, then insert it.

    Record (1,2,3,4) is to be updated to (5,6,7,8).
    If Record (5,6,7,8) already exists, then do not update the record. (duplicate record not allowed).
    If Record (5,6,7,8) does not exist, then update the record to the new values.

I previously had used a unique index on the record's fields, but would like to learn how a trigger is written to accomplish this.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Alan Wayne
  • 5,122
  • 10
  • 52
  • 95
  • And, did you try something ? – Houari Apr 28 '15 at 19:41
  • Search the web for keywords _postgresql_ and _upsert_ – Str. Apr 28 '15 at 19:43
  • Why don't you continue to use the unique index? That will be much more efficient –  Apr 28 '15 at 19:46
  • @Houari Tried and tried again ... failed miserably. (Not work reprinting here :)). – Alan Wayne Apr 28 '15 at 22:12
  • @Houari I don't know how (or if ) I could step through the lines to verify the procedure-- i.e., how to debug this? – Alan Wayne Apr 28 '15 at 22:14
  • @a_horse_with_no_name I could, but not sure how to deal with failure on insert/update within a stored procedure, and the fields are very large, so I am guessing the index would become incredibly large on the 100,000 or so records. – Alan Wayne Apr 28 '15 at 22:18
  • @Str "upsert" is interesting but not quit what I was looking for here. Thanks. – Alan Wayne Apr 28 '15 at 22:46
  • 100.000 rows is nothing. Don't worry about _that_ number. Not sure what you mean with "*the columns are very large*" - if you have (four) large character values it seems rather uncommon to impose a unique constraint on them. Maybe if you described your underlying problem (including your **real** table definition) we can come up with a better solution. –  Apr 29 '15 at 05:43
  • @a_horse_with_no_name Being a newbie here, my columns are about 100 characters wide. Assuming a unique index on 4 such columns, at what point in column widths does an index in PostgreSQL become big enough to slow down insert/update/delete operations? Just curious. Thanks. – Alan Wayne Apr 29 '15 at 15:11

1 Answers1

16

I previously had used a unique index on the record's fields, but would like to learn how a trigger is written to accomplish this.

This is a misunderstanding. If a set of columns is supposed to be unique, use a UNIQUE constraint (or make it the PK) in any case. And be aware of a special role for NULL values:

Postgres 9.5 or later

There is a much simpler solution now with INSERT ... ON CONFLICT ... DO NOTHING - a subset of the new "UPSERT":

INSERT INTO tbl (col1, col2, col3, col4)
VALUES (1, 2, 3, 4)
ON     CONFLICT ON CONSTRAINT my_4_col_uni DO NOTHING;

The rest of the answer is largely outdated.

Postgres 9.4 or older

Triggers can help to enforce the constraint. But they fail to enforce uniqueness on their own due to inherent race conditions.

You can just let the unique constraint handle duplicate keys. You'll get an EXCEPTION for violations. To avoid exceptions most of the time1 you can use a simple trigger:

CREATE OR REPLACE FUNCTION tbl_ins_up_before()
  RETURNS trigger
  LANGUAGE plpgsql AS
$func$
BEGIN
   IF EXISTS (SELECT FROM tbl
              WHERE (col1,     col2,     col3,     col4)
              = (NEW.col1, NEW.col2, NEW.col3, NEW.col4)) THEN
      RETURN NULL;
   END IF;

   RETURN NEW;
END
$func$;

CREATE TRIGGER ins_up_before
BEFORE INSERT OR UPDATE OF col1, col2, col3, col4  -- fire only when relevant
ON tbl
FOR EACH ROW EXECUTE PROCEDURE tbl_ins_up_before();

1 There is an inherent race condition in the time slice between checking if a row already exists and actually inserting the row, which cannot be avoided unless you lock the table exclusively (very expensive). Details depend on the exact definition of your constraint (may be deferrable). So you might still get an exception if a concurrent transaction also finds (at virtually the same moment) that (1,2,3,4) is not there yet and inserts before you. Or the operation might get aborted, but the existing row is deleted before you can commit.

This cannot be fixed with row-level locking either, because you cannot lock rows that aren't there yet (predicate locking) in Postgres, at least up to version 9.6.

You need a unique constraint, which enforces uniqueness at all times.

I would have the constraint and then use this query:

INSERT INTO tbl (col1, col2, col3, col4)
SELECT 1, 2, 3, 4
WHERE  NOT EXISTS (
   SELECT 1 FROM tbl
   WHERE (col1, col2, col3, col4) = (1, 2, 3, 4);

Similar for UPDATE.

You could encapsulate INSERT / UPDATE in a PL/pgSQL function and trap duplicate key violations. Example:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228