We have a legacy database schema that has some interesting design decisions. Until recently, we have only supported Oracle and SQL Server, but we are trying to add support for PostgreSQL, which has brought up an interesting problem. I have searched Stack Overflow and the rest of the internet and I don't believe this particular situation is a duplicate.
Oracle and SQL Server both behave the same when it comes to nullable columns in a unique constraint, which is to essentially ignore the columns that are NULL when performing the unique check.
Let's say I have the following table and constraint:
CREATE TABLE EXAMPLE
(
ID TEXT NOT NULL PRIMARY KEY,
FIELD1 TEXT NULL,
FIELD2 TEXT NULL,
FIELD3 TEXT NULL,
FIELD4 TEXT NULL,
FIELD5 TEXT NULL,
...
);
CREATE UNIQUE INDEX EXAMPLE_INDEX ON EXAMPLE
(
FIELD1 ASC,
FIELD2 ASC,
FIELD3 ASC,
FIELD4 ASC,
FIELD5 ASC
);
On both Oracle and SQL Server, leaving any of the nullable columns NULL
will result in only performing a uniqueness check on the non-null columns. So the following inserts can only be done once:
INSERT INTO EXAMPLE VALUES ('1','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('2','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');
-- These will succeed when they should violate the unique constraint:
INSERT INTO EXAMPLE VALUES ('3','FIELD1_DATA', NULL, NULL, NULL, NULL );
INSERT INTO EXAMPLE VALUES ('4','FIELD1_DATA','FIELD2_DATA', NULL, NULL,'FIELD5_DATA');
However, because PostgreSQL (correctly) adheres to the SQL Standard, those insertions (and any other combination of values as long as one of them is NULL) will not throw an error and be inserted correctly no problem. Unfortunately, because of our legacy schema and the supporting code, we need PostgreSQL to behave the same as SQL Server and Oracle.
I am aware of the following Stack Overflow question and its answers: Create unique constraint with null columns. From my understanding, there are two strategies to solve this problem:
- Create partial indexes that describe the index in cases where the nullable columns are both
NULL
andNOT NULL
(which results in exponential growth of the number of partial indexes) - Use
COAELSCE
with a sentinel value on the nullable columns in the index.
The problem with (1) is that the number of partial indexes we'd need to create grows exponentially with each additional nullable column we'd like to add to the constraint (2^N if I am not mistaken). The problems with (2) are that sentinel values reduces the number of available values for that column and all of the potential performance problems.
My question: are these the only two solutions to this problem? If so, what are the tradeoffs between them for this particular use case? A good answer would discuss the performance of each solution, the maintainability, how PostgreSQL would utilize these indexes in simple SELECT
statements, and any other "gotchas" or things to be aware of. Keep in mind that 5 nullable columns was only for an example; we have some tables in our schema with up to 10 (yes, I cry every time I see it, but it is what it is).