10

Given the following three columns in a Postgres database: first, second, third; how can I create a constraint such that permutations are unique?

E.g. If ('foo', 'bar', 'shiz') exist in the db, ('bar', 'shiz', 'foo') would be excluded as non-unique.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
KevDog
  • 5,763
  • 9
  • 42
  • 73
  • I'd like to know this too. Just out of curiosity. – ffflabs Aug 21 '13 at 14:32
  • 2
    Can your columns be NULL? Can they be empty? Define "unique" then. Your Postgres version? And I don't see a table definition for testing in your question? – Erwin Brandstetter Aug 21 '13 at 14:57
  • Non null. 9.2.2 The values are all text. – KevDog Aug 21 '13 at 15:05
  • 1
    You might want to upgrade to 9.2.4 to snuff out the security vulnerability patched in 9.2.4. More details at http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-1899 – bma Aug 21 '13 at 15:07

4 Answers4

6

You could use hstore to create the unique index:

CREATE UNIQUE INDEX hidx ON test USING BTREE (hstore(ARRAY[a,b,c], ARRAY[a,b,c]));

Fiddle

UPDATE

Actually

CREATE UNIQUE INDEX hidx ON test USING BTREE (hstore(ARRAY[a,b,c], ARRAY[null,null,null]));

might be a better idea since it will work the same but should take less space (fiddle).

Jakub Kania
  • 15,665
  • 2
  • 37
  • 47
5

For only three columns this unique index using only basic expressions should perform very well. No additional modules like hstore or custom function needed:

CREATE UNIQUE INDEX t_abc_uni_idx ON t (
  LEAST(a,b,c)
, GREATEST(LEAST(a,b), LEAST(b,c), LEAST(a,c))
, GREATEST(a,b,c)
);

SQL fiddle

Also needs the least disk space:

SELECT pg_column_size(row(hstore(t))) AS hst_row
      ,pg_column_size(row(hstore(ARRAY[a,b,c], ARRAY[a,b,c]))) AS hst1
      ,pg_column_size(row(hstore(ARRAY[a,b,c], ARRAY[null,null,null]))) AS hst2
      ,pg_column_size(row(ARRAY[a,b,c])) AS arr
      ,pg_column_size(row(LEAST(a,b,c)
                        , GREATEST(LEAST(a,b), LEAST(b,c), LEAST(a,c))
                        , GREATEST(a,b,c))) AS columns
FROM t;

 hst_row | hst1 | hst2 | arr | columns
---------+------+------+-----+---------
      59 |   59 |   56 |  69 |      30

Numbers are bytes for index row in the example in the fiddle, measured with pg_column_size(). My example uses only single characters, the difference in size is constant.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
3

You can do this by creating a unique index on a function which returns a sorted array of the values in the columns:

CREATE OR REPLACE FUNCTION sorted_array(anyarray)
RETURNS anyarray
AS $BODY$
  SELECT array_agg(x) FROM (SELECT unnest($1) AS x FROM test ORDER BY x) AS y;
$BODY$
LANGUAGE sql IMMUTABLE;

CREATE UNIQUE index ON test (sorted_array(array[first,second,third]));
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
qqx
  • 18,947
  • 4
  • 64
  • 68
  • Might be worth checking what the performance impact of a couple of nested CASE expressions to manually sort would be too. With only 3 columns that should be viable. Might not be worth the extra time though. – Richard Huxton Aug 21 '13 at 15:07
  • Richard, performance not a driver here. Small app, maybe 20-100 inserts a day. – KevDog Aug 21 '13 at 15:09
0

Suggestion from co-worker, variation of @julien's idea:

Sort the terms alphabetically and place a delimiter on either side of each term. Concatenate them and place them in a separate field that becomes the primary key.

Why the delimiter? So that, "a", "aa", "aaa" and "aa", "aa", "aa" can both be inserted.

KevDog
  • 5,763
  • 9
  • 42
  • 73