Postgres UNIQUE CONSTRAINT for array

Question

How to create a constraint on the uniqueness of all the values in the array like:

CREATE TABLE mytable
(
    interface integer[2],
    CONSTRAINT link_check UNIQUE (sort(interface))
)

my sort function

create or replace function sort(anyarray)
returns anyarray as $$
select array(select $1[i] from generate_series(array_lower($1,1),
array_upper($1,1)) g(i) order by 1)
$$ language sql strict immutable;

I need that would be the value {10, 22} and {22, 10} considered the same and check under the UNIQUE CONSTRAINT

score 21 · Accepted Answer · edited Jul 04 '20 at 13:14

I don't think you can use a function with a unique constraint but you can with a unique index. So given a sorting function something like this:

create function sort_array(anyarray) returns anyarray as $$
    select array_agg(distinct n order by n) from unnest($1) as t(n);
$$ language sql immutable;

Then you could do this:

create table mytable (
    interface integer[2] 
);
create unique index mytable_uniq on mytable (sort_array(interface));

Then the following happens:

=> insert into mytable (interface) values (array[11,23]);
INSERT 0 1
=> insert into mytable (interface) values (array[11,23]);
ERROR:  duplicate key value violates unique constraint "mytable_uniq"
DETAIL:  Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[23,11]);
ERROR:  duplicate key value violates unique constraint "mytable_uniq"
DETAIL:  Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[42,11]);
INSERT 0 1

Erwin Brandstetter · Answer 2 · 2023-03-22T15:09:14.597

mu already demonstrated how an index on an expression can solve your problem.

My attention was caught by the used functions. Both seem like overkill for arrays of two integers. This may be a simplification of the real situation? Either way, I was intrigued and ran a test with a couple of variants.

Test setup

Temporary table with 10000 random pairs of integer:

CREATE TEMP TABLE arr (i int[]);

INSERT INTO arr 
SELECT ARRAY[(random() * 1000)::int, (random() * 1000)::int]
FROM   generate_series(1,10000);

Test candidates with a short comment to explain each one:

mu's query

CREATE OR REPLACE FUNCTION sort_array1(integer[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
    SELECT array_agg(n) FROM (SELECT n FROM unnest($1) AS t(n) ORDER BY n) AS a;
$func$;

The same with ORDER BY inside aggregate (pg 9.0+)

CREATE OR REPLACE FUNCTION sort_array2(int[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT array_agg(n ORDER BY n) FROM unnest($1) AS t(n);
$func$;

uralbash's query

CREATE OR REPLACE FUNCTION sort_array3(anyarray)
  RETURNS anyarray
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(array_lower($1,1), array_upper($1,1)) g(i)
    ORDER  BY 1)
$func$;

Change parameter to int[]

CREATE OR REPLACE FUNCTION sort_array4(int[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(array_lower($1,1), array_upper($1,1)) g(i)
    ORDER  BY 1)
$func$;

Simplify array_lower() - it is always 1

CREATE OR REPLACE FUNCTION sort_array5(int[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(1, array_upper($1,1)) g(i)
    ORDER  BY 1)
$func$;

Further simplify to case with 2 elements

CREATE OR REPLACE FUNCTION sort_array6(int[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT ARRAY(
    SELECT i
    FROM  (VALUES ($1[1]),($1[2])) g(i)
    ORDER  BY 1)
$func$;

My simple query

CREATE OR REPLACE FUNCTION sort_array7(int[])
  RETURNS int[]
  LANGUAGE sql STRICT IMMUTABLE AS
$func$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$func$;

Without STRICT modifier (!)

CREATE OR REPLACE FUNCTION sort_array8(int[])
  RETURNS int[]
  LANGUAGE sql IMMUTABLE AS
$func$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$func$;

Results

I executed each around 20 times and took the best result from EXPLAIN ANALYZE.

SELECT sort_array1(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array2(i) FROM arr  -- Total runtime: 175 ms

SELECT sort_array3(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array4(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array5(i) FROM arr  -- Total runtime: 177 ms
SELECT sort_array6(i) FROM arr  -- Total runtime: 144 ms

SELECT sort_array7(i) FROM arr  -- Total runtime: 103 ms
SELECT sort_array8(i) FROM arr  -- Total runtime:  43 ms (!!!)

These are the results from a v9.0.5 server on Debian Squeeze. Similar results on v.8.4.

I also tested PL/pgSQL variants which were a bit slower as expected: too much overhead for a tiny operation, no query plan to cache.

The simple function (nr. 7) is substantially faster than the others. That was expected, the overhead of the other variants is just too much for a tiny array.

Leaving away the STRICT declaration more than doubles the speed. I did not expect that and posted this follow-up uncover why:

Function executes faster without STRICT modifier?

score 10 · Answer 3 · answered Dec 10 '11 at 08:39

10

Just create a unique index on the two values:

create unique index ix on 
  mytable(least(interface[1], interface[2]), greatest(interface[1], interface[2]));

answered Dec 10 '11 at 08:39

2

+1 for another concise version! Performs almost as fast as the CASE statement. If the index is more useful holding two integers instead of an array, this is the way to go. – Erwin Brandstetter Dec 10 '11 at 10:45
The problem with this option is AFAICT you're not able to use the resulting index in your queries. – seanlinsley Jul 04 '15 at 21:43

Postgres UNIQUE CONSTRAINT for array

3 Answers3

Test setup

Results

Linked