45

This is one strange, unwanted behavior I encountered in Postgres: When I create a Postgres table with composite primary keys, it enforces NOT NULL constraint on each column of the composite combination.

For example,

CREATE TABLE distributors (m_id integer, x_id integer, PRIMARY KEY(m_id, x_id));

enforces NOT NULL constraint on columns m_id and x_id, which I don't want! MySQL doesn't do this. I think Oracle doesn't do it as well.

I understand that PRIMARY KEY enforces UNIQUE and NOT NULL automatically but that makes sense for single-column primary key. In a multi-column primary key table, the uniqueness is determined by the combination.

Is there any simple way of avoiding this behavior of Postgres? When I execute this:

CREATE TABLE distributors (m_id integer, x_id integer);

I do not get any NOT NULL constraints of course. But I would not have a primary key either.

tanius
  • 14,003
  • 3
  • 51
  • 63
user3422637
  • 3,967
  • 17
  • 49
  • 72
  • 4
    Well: it is the standard. Imagine two tuples being present, both with `x_id = NULL`. Now, if you wanted to update _only one_ of these, you would not be able to address only one, that would violate 2NF. (all this because: **NULL is not a value**) Note that for the DBMS at ddl-time, there are two choices: either refusing the definition, or converting the key-fields to not-nullable automatically. – wildplasser Oct 30 '14 at 00:11
  • I see what you're saying. I am trying to migrate from MySQL to Postgres and hence this inconsistency causes a problem. Can you elaborate more on "Note that for the DBMS at ddl-time, there are two choices: either refusing the definition, or converting the key-fields to not-nullable automatically." – user3422637 Oct 30 '14 at 00:24
  • 1
    It is an illegal definition. So the DBMS can refuse it. Or correct it. (You could compare it to a C-compiler confronted with the definition `int i=1.0;` ) – wildplasser Oct 30 '14 at 00:27
  • 7
    You are wrong. MySQL, Oracle, SQL-Server, all have the same exact behaviour as Postgres. No nulls are allowed in columns that are part of the primary key. – ypercubeᵀᴹ Oct 30 '14 at 00:38
  • 3
    There must be a misunderstanding. Like @ypercube commented: Oracle does not allow NULL values in a PK column: http://sqlfiddle.com/#!4/52cf2/1. Neither does MySQL: http://sqlfiddle.com/#!2/ce8d4e. – Erwin Brandstetter Oct 30 '14 at 01:17
  • 4
    It's not strange behavior. A collection of columns that can include NULLS is not, by definition, a candidate key. This has been part of the relational model since 1970. If it can't be a candidate key, then it can't be the primary key. It would be strange if Postgres did *not* enforce it. – Walter Mitty Oct 30 '14 at 08:42

2 Answers2

70

If you need to allow NULL values, use a UNIQUE constraint (or index) instead of a PRIMARY KEY (and add a surrogate PK column - I suggest a serial or IDENTITY column in Postgres 10 or later).

A UNIQUE constraint allows columns to be NULL:

CREATE TABLE distributor (
  distributor_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, m_id integer
, x_id integer
, UNIQUE(m_id, x_id)  -- !
-- , CONSTRAINT distributor_my_name_uni UNIQUE (m_id, x_id)  -- verbose form
);

The manual:

For the purpose of a unique constraint, null values are not considered equal, unless NULLS NOT DISTINCT is specified.

In your case, you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint. Postgres never considers two NULL values equal - as per definition in the SQL standard.

If you need to treat NULL values as equal (i.e. "not distinct") to disallow such "duplicates", I see two three (since Postgres 15) options:

0. NULLS NOT DISTINCT

This option was added with Postgres 15 and allows to treat NULL values as "not distinct", so two of them conflict in a unique constraint or index. This is the most convenient option, going forward. The manual:

That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior can be changed by adding the clause NULLS NOT DISTINCT ...

Detailed instructions:

1. Two partial indexes

In addition to the UNIQUE constraint above:

CREATE UNIQUE INDEX dist_m_uni_idx ON distributor (m_id) WHERE x_id IS NULL;
CREATE UNIQUE INDEX dist_x_uni_idx ON distributor (x_id) WHERE m_id IS NULL;

But this gets out of hands quickly with more than two columns that can be NULL. See:

2. A multi-column UNIQUE index on expressions

Instead of the UNIQUE constraint. We need a free default value that is never present in involved columns, like -1. Add CHECK constraints to disallow it:

CREATE TABLE distributor (
   distributor serial PRIMARY KEY
 , m_id integer
 , x_id integer
 , CHECK (m_id &lt> -1)
 , CHECK (x_id &lt> -1)
);
CREATE UNIQUE INDEX distributor_uni_idx
ON distributor (COALESCE(m_id, -1), COALESCE(x_id, -1));
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    MySQL behaves the same exact way as Postgres in this aspect. The OP is wrong in his assertions. – ypercubeᵀᴹ Oct 30 '14 at 00:41
  • 1
    That is a nice (but ugly) workaround. Sigh. @ypercube : could depend on the engine, IIRC. Or the phase of the moon... – wildplasser Oct 30 '14 at 00:43
  • 1
    @wildplasser No, it doesn't depend on the engine (at least myisam and innodb, the 2 most common engines.) Perhaps the OP is working on some ancient version of mysql. – ypercubeᵀᴹ Oct 30 '14 at 00:46
  • 3
    And why is it ugly? The op wants unique constraints and nulls. – ypercubeᵀᴹ Oct 30 '14 at 00:48
  • 1
    @ypercube: OK, removed the assertion about MySQL, I trust you know its behavior. I know Oracle treats NULL and empty string identically, at least in some contexts, I suspected MySQL would treat two NULL as equal in a UNIQUE constraint, but no it doesn't: http://sqlfiddle.com/#!2/78ddc/1. But Oracle does: http://sqlfiddle.com/#!4/5a6db/1 (disappointing for such a big player ...) – Erwin Brandstetter Oct 30 '14 at 01:07
  • Yes, indeed. And it gets more weird. [Oracle marvels: Fiddle-2](http://sqlfiddle.com/#!4/b689c/1). Can you guess what causes this? – ypercubeᵀᴹ Oct 30 '14 at 01:24
  • @ypercube: I guess Larry and his friends smoked bad stuff in the late 70s when they started working on their RDBMS. Seriously, seems like all-null values constitute an actual "composite null" that is treated as not equal to another one, while partial NULL values are treated like an empty string in character types, and therefore equal. Wouldn't want *that* ... – Erwin Brandstetter Oct 30 '14 at 01:33
  • It has nothing to do with strings. It's a side effect of indexes - which do not include rows that have only nulls. If a row (of the index) has some non-null values, then it is included in the index. The only excuse they may have is that their implementation may predate the SQL standard (no idea if that's true) ;) – ypercubeᵀᴹ Oct 30 '14 at 01:49
  • This is an very bad workaround, by changing the constraint to unique and adding a surrogate primary key. – schellingerht Dec 02 '16 at 09:34
1

In case you want a polymorphic relation

Your table uses column names that indicate that they are probably references to other tables:

CREATE TABLE distributors (m_id integer, x_id integer);

So I think you probably are trying to model a polymorphic relation to other tables – where a record in your table distributors can refer to one m record xor one x record.

Polymorphic relations are difficult in SQL. The best resource I have seen about this topic is "Modeling Polymorphic Associations in a Relational Database". There, four alternative options are presented, and the recommendation for most cases is called "Exclusive Belongs To", which in your case would lead to a table like this:

CREATE TABLE distributors (
  id serial PRIMARY KEY,
  m_id integer REFERENCES m,
  x_id integer REFERENCES x,
  CHECK (
    ((m_id IS NOT NULL)::integer + (x_id IS NOT NULL)::integer) = 1
  )
);

CREATE UNIQUE INDEX ON distributors (m_id) WHERE m_id IS NOT NULL;
CREATE UNIQUE INDEX ON distributors (x_id) WHERE x_id IS NOT NULL;

Like other solutions, this uses a surrogate primary key column because primary keys are enforced to not contain NULL values in the SQL standard.

This solution adds a 4th option to the three in @Erwin Brandstetter's answer for how to avoid the case where "you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint." Here, that case is excluded by a combination of two measures:

  1. Partial unique indexes on each column individually: two records (1, NULL) and (1, NULL) would not violate the constraint on the second column as NULLs are considered distinct, but they would violate the constraint on the first column (two records with value 1).

  2. Check constraint: The missing piece is preventing multiple (NULL, NULL) records, still allowed because NULLs are considered distinct, and anyway because our partial indexes do not cover them to save space and write events. This is achieved by the CHECK constraint, which prevents any (NULL, NULL) records by making sure that exactly one column is NULL.

There's one difference though: all alternatives in @Erwin Brandstetter's answer allow at least one record (NULL, NULL) and any number of records with no NULL value in any column (like (1, 2)). When modeling a polymorphic relation, you want to disallow such records. That is achieved by the check constraint in the solution above.

tanius
  • 14,003
  • 3
  • 51
  • 63