Unique index by another column for postgresql

Question

For example, I have this table (postgresql):

CREATE TABLE t(
 a TEXT,
 b TEXT
);
CREATE UNIQUE INDEX t_a_uniq_idx ON t(a);

I want to create unique constraint/index for b and a columns. But not simple ADD CONSTRAINT t_ab UNIQUE (a, b). I want unique b by a:

INSERT INTO t(a,b) VALUES('123', null); -- this is ok
INSERT INTO t(a,b) VALUES('456', '123'); -- this is not ok, because duplicate '123'

How can I do this?

Edit:

Why do I need this? For example, If I have users table and I want to create email changing feature, I need structure like this:

CREATE TABLE users(
 email TEXT,
 unconfirmed_email TEXT
 -- some other data
);
CREATE UNIQUE INDEX unq_users_email_idx ON users(email);

User can set value into unconfirmed_email column, but only if this value don't used in email column.

You update still leaves room for interpretation. What if a user sets his `email` to a value already present in `unconfirmed_email`? `email = unconfirmed_email` in the same row is allowed? `unconfirmed_email UNIQUE`? — Erwin Brandstetter, May 31 '20 at 12:56
No, it is not allowed. Every email must be unique by two columns. — Kroid, Jun 01 '20 at 08:45

Gordon Linoff · Answer 1 · 2020-05-30T18:10:18.977

1

If uniqueness is needed across both columns, I think you have the wrong data model. Instead of storing pairs on a single row, you should have two tables:

create table pairs (
    pairid int generated always as identity,
    . . .   -- more information about the pair, if needed
);

create table pairElements (
    pairElementId int generated always as identity,
    pairId int references pairs(pairid),
    which int check (which in (1, 2)),
    value text,
    unique (pairid, which)
);

Then the condition is simple:

create constraint unq_pairelements_value unique pairelements(value);

edited May 30 '20 at 18:10

answered May 30 '20 at 18:04

Gordon Linoff

1,242,037
58
646
786

Yes, this could work. But I don't want to join two tables for every search request. Isn't there an easier solution? – Kroid May 31 '20 at 05:32

Erwin Brandstetter · Answer 2 · 2020-09-23T11:10:21.950

As I understand it, you want a UNIQUE index over a and b combined.
You update narrowed it down (b shall not exist in a). This solution is stricter.

Solution

After trying and investigating quite a bit (see below!) I came up with this:

ALTER TABLE tbl ADD CONSTRAINT a_not_equal_b CHECK (a <> b);
ALTER TABLE tbl ADD CONSTRAINT ab_unique
   EXCLUDE USING gist ((ARRAY[hashtext(COALESCE(a, ''))
                            , hashtext(COALESCE(b, ''))]) gist__int_ops WITH &&);

db<>fiddle here

Since the exclusion constraint won't currently (pg 12) work with text[], I work with int4[] of hash values. hashtext() is the built-in hash function that's also used for hash-partitioning (among other uses). Seems perfect for the job.

The operator class gist__int_ops is provided by the additional module intarray, which has to be installed once per database. Its optional, the solution works with the default array operator class as well. Just drop gist__int_ops to fall back. But intarray is faster. Related:

How to create an index for elements of an array in PostgreSQL?

Caveats

int4 may not be big enough to rule out hash collisions sufficiently. You might want to go with bigint instead. But that's more expensive and can't use the gist__int_ops operator class to improve performance. Your call.
Unicode has the dismal property that equal strings can be encoded in different ways. If you work with Unicode (typical encoding UTF8) and use non-ASCII characters (and this matters to you), compare normalized forms to rule out such duplicates. The upcoming Postgres 13 adds the function normalize() for that purpose. This is a general caveat of character type duplicates, though, not specific to my solution.
NULL value is allowed, but collides with empty string (''). I would rather go with NOT NULL columns and drop COALESCE() from the expressions.

Obstacle course to an exclusion constraint

My first thought was: exclusion constraint. But it falls through:

ALTER TABLE tbl ADD CONSTRAINT ab_unique EXCLUDE USING gist ((ARRAY[a,b]) WITH &&);

ERROR:  data type text[] has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

There is an open TODO item for this. Related:

ERROR: data type text[] has no default operator class for access method “gist”

But can't we use a GIN index for text[]? Alas, no:

ALTER TABLE tbl ADD CONSTRAINT ab_unique EXCLUDE USING gin ((ARRAY[a,b]) WITH &&);

ERROR:  access method "gin" does not support exclusion constraints

Why? The manual:

The access method must support amgettuple (see Chapter 61); at present this means GIN cannot be used.

It seems hard to implement, so don't hold your breath.

If a and b were integer columns, we could make it work with an integer array:

ALTER TABLE tbl ADD CONSTRAINT ab_unique EXCLUDE USING gist ((ARRAY[a,b]) WITH &&);

Or with the gist__int_ops operator class from the additional module intarray (typically faster):

ALTER TABLE tbl ADD CONSTRAINT ab_unique EXCLUDE USING gist ((ARRAY[a,b]) gist__int_ops WITH &&);

To also forbid duplicates within the same row, add a CHECK constraint:

ALTER TABLE tbl ADD CONSTRAINT a_not_equal_b CHECK (a <> b);

Remaining issue: Does work with NULL values.

Workaround

Add a helper table to store values from a and b in one column:

CREATE TABLE tbl_ab(ab text PRIMARY KEY);

Main table, like you had it, plus FK constraints.

CREATE TABLE tbl (
  a text REFERENCES tbl_ab ON UPDATE CASCADE ON DELETE CASCADE
, b text REFERENCES tbl_ab ON UPDATE CASCADE ON DELETE CASCADE
);

Use a function like this to INSERT:

CREATE OR REPLACE FUNCTION f_tbl_insert(_a text, _b text)
 RETURNS void
 LANGUAGE sql AS
$func$
WITH ins_ab AS (
   INSERT INTO tbl_ab(ab)
   SELECT _a WHERE _a IS NOT NULL  -- NULL is allowed (?)
   UNION ALL
   SELECT _b WHERE _b IS NOT NULL
   )
INSERT INTO tbl(a,b)
VALUES (_a, _b);
$func$;

db<>fiddle here

Or implement a trigger to take care of it in the background.

CREATE OR REPLACE FUNCTION trg_tbl_insbef()
  RETURNS trigger AS
$func$
BEGIN
   INSERT INTO tbl_ab(ab)
   SELECT NEW.a WHERE NEW.a IS NOT NULL  -- NULL is allowed (?)
   UNION ALL
   SELECT NEW.b WHERE NEW.b IS NOT NULL;

   RETURN NEW;
END
$func$ LANGUAGE plpgsql;

CREATE TRIGGER tbl_insbef
BEFORE INSERT ON tbl
FOR EACH ROW EXECUTE PROCEDURE trg_tbl_insbef();

db<>fiddle here

NULL handling can be changed as desired.

Either way, while the added (optional) FK constraints enforce that we can't sidestep the helper table tbl_ab, and allow to UPDATE and DELETE in tbl_ab to cascade, you still need to project UPDATE and DELETE into the helper table as well (or implement more triggers). Tricky corner cases, but there are solutions. Not going into this, after I found the solution above with an exclusion constraint using hashtext() ...

Can PostgreSQL have a uniqueness constraint on array elements?

Wow, thanks for the detailed answer! I also thinking about exclude constraint. Too bad we can't use array for strings. — Kroid, May 31 '20 at 05:35
@Kroid: We can make it work with an integer array. Consider the added solution. :) — Erwin Brandstetter, May 31 '20 at 12:21

IMSoP · Answer 3 · 2020-05-31T14:39:17.823

While this leads to an interesting problem, I agree that the data could be modelled better - specifically, the column unconfirmed_email can be seen as combining two attributes: an association between an address and a user, which it shares with the email column; and a status of whether that address is confirmed, which is dependant on the combination of user and address, not on one or the other.

This implies that a new table should be extracted of user_email_addresses:

user_id - foreign key to users
email - non-nullable
is_confirmed boolean

Interestingly, as often turns out to be the case, this extracted table has natural data that could be added:

When was the address added?
When was it confirmed?
What is the verification code sent to the user?
If the user is allowed multiple addresses, which is the primary, or which is to be used for a particular purpose?

We can now model various constraints on this table (using unique indexes in some cases because you can't specify Where on a unique constraint):

Each user can only have one association (whether confirmed or not) with a particular e-mail address: Constraint Unique ( user_id, email )
An e-mail address can only be confirmed for one user: Unique Index On user_emails ( email ) Where is_confirmed Is True;
Each user can have only one confirmed address: Unique Index On user_emails ( user_id ) Where is_confirmed Is True;. You might want to adjust this to allow users to confirm multiple addresses, but have a single "primary" address.
Each user can have only one unconfirmed address: Unique Index On user_emails ( user_id ) Where is_confirmed Is False;. This is implied in your current design, but may not actually be necessary.

This leaves us with a re-worded version of your original problem: how do we forbid unconfirmed rows with the same email as a confirmed row, but allow multiple identical unconfirmed rows.

One approach would be to use an Exclude constraint for rows where email matches, but is_confirmed doesn't match. The cast to int is necessary because creating the gist index on boolean fails.

Alter Table user_emails
Add Constraint unconfirmed_must_not_match_confirmed
Exclude Using gist ( 
    email With =,
    Cast(is_confirmed as Int) With <>
);

On its own, this would allow multiple copies of email, as long as they all had the same value of is_confirmed. But since we've already constrained against multiple rows where is_confirmed Is True, the only duplicates remaining will be where is_confirmed Is False on all matching rows.

Here is a db<>fiddle demonstrating the above design: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=fd8e4e6a4cce79d9bc6bf07111e68df9

Unique index by another column for postgresql

3 Answers3

Solution

Caveats

Obstacle course to an exclusion constraint

Workaround