0

This feels like a very basic question, but I really don't see the obvious answer at the moment.

I have a simple table that maps object ids between two namespaces:

|---------------------|------------------|
|      id_in_ns1      |     id_in_ns2    |
|---------------------|------------------|
|          1          |         5        |
|---------------------|------------------|
|          2          |         17       |
|---------------------|------------------|
|          3          |        NULL      |
|---------------------|------------------|
|         NULL        |         1        |
|---------------------|------------------|

The mapping is basically 1:1, but as you can see, some objects from namespace 1 do not exist in namespace 2, and vice versa, so that there are NULL values in the table.

So, what would be the primary key of this table? As a PK cannot be NULL, I can neither use (id_in_ns1) nor (id_in_ns2) nor the composite.

The only idea I have is to replace NULL by a definite value, say -1, and to use (id_in_ns1, id_in_ns2)as PK. However, this feels not only hackish but also "unnormal" because the non-NULL (or non--1)) value alone is already sufficient to uniquely identify an object.

Remirror
  • 692
  • 5
  • 14
  • SQL doesn't easily support what would correspond to a relational PK or superkey containing null. SQL PK means UNIQUE NOT NULL. UNIQUE (corresponding to a relational superkey) treats rows that are the same but contain null as not SQL duplicates. (Except SQL Server which is non-standard.) Moreover a FK subrow value with a null does not match the same subrow value elsewhere. [What to do with null values when modeling and normalizing?](https://stackoverflow.com/a/40733625/3404097) To identify subrows that hold nulls use a surrogate. – philipxy Jul 29 '19 at 22:22
  • Possible duplicate of [How to have a primary key combination that may have null values?](https://stackoverflow.com/questions/2415928/how-to-have-a-primary-key-combination-that-may-have-null-values) – philipxy Jul 29 '19 at 22:41

2 Answers2

0

Only add entries that have a valid id on both sides. This will effectively get rid of all NULL values, allowing you to specify a proper composite key on (id_in_ns1, id_in_ns2).

Ultimately, those are the values that allow you to identify a single row and you will not lose relevant information - a SELECT id_in_ns2 FROM mapping_table WHERE id_in_ns1 = x will return NULL either way, whether there is a (x, NULL) row or not.

If you insist on keeping those NULLs you could add another column with an artificial (auto incrementing) primary key, but that feels as hacky as using -1.

Marvin
  • 13,325
  • 3
  • 51
  • 57
  • Thanks, but if there was a third namespace and column id_in_ns3, your suggestion would not work. For instance, the first row could be `1 | 5 | NULL` then and if you omit it, you loose the mapping ns1:1=ns2:5. – Remirror Jul 27 '19 at 23:30
  • True. Your question explicitly mentions _two_ namespaces, though. – Marvin Jul 27 '19 at 23:55
0

Use a synthetic primary key and use unique constraints for the rest:

create table mapping (
    mappingId int auto_increment primary key,  -- or whatever for your database
    id_in_ns1 int references ns1(id),
    id_in_ns2 int references ns2(id),
    unique (id_in_ns1),
    unique (id_in_ns2)
);

Just one caveat: some databases only allow one NULL value for UNIQUE constraints. You might need to use a filtered unique index instead (or some other mechanism) for this construct.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786