6

I have a legacy Oracle database that has a strange quirk I wanted to make sense of. It has a composite foreign key where some columns are nullable. To me this smells like a bad design by a careless developer, but I wanted to ask for opinions. Of course the original development team is long gone.

The table is much larger in terms of columns but I think I was able to distill the issue in the example below:

create table quadrant (
  region number(9) not null,
  area number(9) not null,
  caption varchar2(20),
  primary key (region, area)
);

insert into quadrant (region, area, caption) values (10, 123, 'Chicago');
insert into quadrant (region, area, caption) values (10, 125, 'Wisconsin');

create table farm (
  id number(9),
  region_id number(9) not null,
  area_id number(9),
  name varchar2(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1');
insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2');

select * from farm;

Result:

ID  REGION_ID  AREA_ID  NAME
--  ---------  -------  ------
5   10         <null>   farm 1  <-- Does it point to anything?
6   11         <null>   farm 2  <-- Region 11 doesn't even exist!

If parts of a foreign key are null, then what meaning does it have?

  • Is REGION_ID, AREA_ID = (10, null) pointing to anything, or it's just useless info?
  • Is REGION_ID, AREA_ID = (11, null) pointing to anything? I don't think so.

I am tempted to add a constraint to enforce ALL or NONE in terms of null values on the foreign key. Does it make sense?

But above all, what's the use case for this "feature"?

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • It probably refers to the entire region. If I designed the database, there would be a `Regions` table if any foreign key in another table were called `Region_Id`. – Gordon Linoff Sep 13 '18 at 18:25
  • But even though region = 11 doesn't even exist, the insert succeeds. – The Impaler Sep 13 '18 at 18:29
  • Find out how FKs work. FK matches a PK per MATCH mode. Default SIMPLE means if there's a null then there's a match. This is a faq--obviously from it basic nature. Please always google many clear, concise & specific versions/phrasings of your question/problem/goal with & without your particular strings/names & read many answers. Add relevant keywords you discover to your searches. If you don't find an answer then post, using one variant search for your title & keywords for your tags. See the downvote arrow mouseover text. PS [Related.](https://stackoverflow.com/a/40733625/3404097). – philipxy Sep 13 '18 at 18:40
  • Possible duplicate of [Can table columns with a foreign key be null?](https://stackoverflow.com/questions/2366854/can-table-columns-with-a-foreign-key-be-null) – philipxy Sep 13 '18 at 18:41
  • @philipxy Thanks for the MATCH mode. I think Oracle defaults to "simple" and I can't find any way of switching it to "full". And, yes, I did my research but didn't know how to even name the issue. – The Impaler Sep 13 '18 at 18:42
  • I don't think is a duplicate of https://stackoverflow.com/questions/2366854/can-table-columns-with-a-foreign-key-be-null - That one is about whole nullable FKs, not partially nullable ones. – The Impaler Sep 13 '18 at 18:48
  • Thanks @philipxy, that was unnecessary. – The Impaler Sep 13 '18 at 18:51
  • I don't understand your comment. (My comment was created & posted before I saw your comment re duplicates though I added a sentence after I saw yours.) I don't know what you think is unnecessary. This is an obvious faq & obviously is going to be addressed in documentation re SQL FKs & there are tons of duplicate candidates & you are not using google effectively but you don't seem to realize all this so I am trying to help you by telling you. [ask] – philipxy Sep 13 '18 at 18:58

5 Answers5

11

Thanks to all answers and comments. This issue I have forced me to learn something new, and that's a good thing. @philipxy gave me the big clue. I wanted to recap what I learned since it will probably be of use to someone else, and this is a good place to record it.

This question has two sides: First, what a partially null foreign key means, and second how it's implemented.

Meaning of a Partially Null Foreign Keys

There's a lot of debate of what does this mean -- as @agiles231 points out. NULL could mean:

  • a value is unknown.
  • others say it means the value is invalid.
  • others say NULL is a bona fide value per se.

In short, there's no clear answer to its meaning as of yet.

I guess depending on how people interpret null values, then the strategy to use them in foreign keys (and to validate them) could be different.

Implementation of Partially Null Foreign Keys

The SQL-92 Standard defines (section 4.10.2) three different ways of matching composite foreign keys with nullable values:

  • Match SIMPLE: If any column of the composite foreign key is null, then the foreign key is accepted, stored, but not validated against the referenced table. This is usually the default mode databases offer. In the SQL-92 standard this mode is described but not named.

  • Match PARTIAL: If any column of the composite foreign key is null, then each non-null column is matched against the referenced table to check there is at least one row where the value is present. I've seen no database implementing this mode.

  • Match FULL: Partially null foreign keys are not accepted. Either the foreign key is fully null or fully not null. When null, there's no validation against the referenced table. When not null, it's fully validated against the referenced table. This is what I was expecting as a default behavior (in my blissful ignorance).

Well, I checked how 10 different databases implemented these modes and here's what I found:

Database Engine  Match SIMPLE  Match PARTIAL  Match FULL
---------------  ------------  -------------  ----------
Oracle 12c1      YES*1         NO             NO
DB2 10.5         YES*1         NO             NO
PostgreSQL 10    YES*1         NO             YES
SQL Server 2014  YES*1         NO             NO
MariaDB 10.3     YES*1         NO*2           NO*2
MySQL 8.0        YES*1         NO*2           NO*2
Sybase ASE 16    YES*1         NO             YES
H2 1.4           YES*1         NO             NO
Derby 10.13      YES*1         NO             NO
HyperSQL 2.3     YES*1         NO             YES

*1 This is the default mode.

*2 Accepted when creating a table, but ignored.

In short:

  • All tested databases behave the same way by default: they default to Match SIMPLE.

  • No database I tested supports Match PARTIAL. I guess it makes sense since I personally find little use for it. Moreover, it could become prohibitively expensive to perform partial validation on separate foreign key columns without creating all possible index combinations on the referenced table.

  • PostgreSQL implements Match FULL as well as Sybase ASE. That's great news! Surprisingly HyperSQL (this tiny database) does too.

Workaround to implement Match FULL

The good news is there's an rather easy workaround to implement Match FULL if you happen to need it, in any of the tested databases. Just add a table constraint that allows either all null columns, or all non-null. Something like:

create table farm (
  id int,
  region_id int,
  area_id int,
  name varchar(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area),
  constraint fkfull_region_area check ( -- here's the workaround
    region_id is null and area_id is null or
    region_id is not null and area_id is not null)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1'); -- fails

insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2'); -- fails

insert into farm (id, region_id, area_id, name) values (7, 10, 125, 'farm 3'); -- succeeds

insert into farm (id, region_id, area_id, name) values (8, null, null, 'farm 4'); -- succeeds

It works pretty neat.

Finally, and as a very personal opinion, I would have expected Match FULL to be the default matching strategy. Maybe it's just that to me allowing (by default) foreign keys that do not point to other rows encourages errors in the applications that use the database.

I think that most developers will understand FULL easily, compared to SIMPLE. And PARTIAL is way more complicated, and potentially error prone. Just my opinion.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
1

There is a lot of debate on the use of nulls to mean something. Some would argue null means that the value is unknown or that it represents invalidity, others would argue that it is an actual value itself. I suspect in this case, it represents unknown. Suppose that you were documenting the locations of farms in a county from 100 years prior to now. Using some local history books, you have mapped out 70% of the existing farms from that period and their exact boundaries (or thereabouts), but for the remaining 30%, some have known regions, and some are only known to exist. In this case, I would definitely say that a null foreign key makes sense. It is simply unknown information.

agiles231
  • 71
  • 5
1

Some guesses as the the case for your "feature": Perhaps it may be that the area field only applies to some farms? Example: A farm with a designated area is required to pay some surcharge or tax (guessing here, since I don't know your data)? In this case the NULL means something (not required to pay). Maybe there are farms that existed prior to the implementation of "area", and thus were never assigned one? In this case, the NULL really means NULL, as the area never existed, thus is unknown.

Leia
  • 190
  • 1
  • 4
  • However, 11 is not a valid value by any means, and is still inserted successfully. I guess (my guess) is that in the presence of nulls the whole FK is maybe considered "not yet valid", or "transient". It seems it can be stored, but doesn't really point to anything concrete yet. Again, just a guess. – The Impaler Sep 13 '18 at 18:44
1

I don't know if it makes sense for your data model, but there are definite use cases for partially NULL foreign keys.

Consider a simple table for fixed assets (computers, cars, building, etc -- things that accountants will depreciate). Suppose they want to know where an asset is in use, so they have two columns: company_id and department_id.

Some assets, like buildings, may be shared across departments, so I would expect a foreign key like (123, null). I would also expect a separate foreign key to just the COMPANY table on COMPANY_ID.

The meaning of such a setup is that the company_id must be a known value, and the company/department combination, if it exists, must be a known combination.

UPDATE

I am not sure why you think Oracle cannot do what I am describing. Here is a simple test:

CREATE TABLE tst_company 
  ( company_id NUMBER NOT NULL PRIMARY KEY );

CREATE TABLE tst_department
  ( company_id NUMBER NOT NULL,
    department_id NUMBER NOT NULL,
    CONSTRAINT tst_department_pk PRIMARY KEY ( company_id, department_id ),
    CONSTRAINT tst_department_f1 FOREIGN KEY ( company_id ) REFERENCES tst_company ( company_id ) );

CREATE TABLE tst_asset
  ( asset_id NUMBER NOT NULL PRIMARY KEY,
    company_id NUMBER NOT NULL,
    department_id NUMBER,
    CONSTRAINT tst_asset_f1 FOREIGN KEY ( company_id ) REFERENCES tst_company ( company_id ),
    CONSTRAINT tst_asset_f2 FOREIGN KEY ( company_id, department_id ) REFERENCES tst_department ( company_id, department_id ) );

INSERT INTO tst_company ( company_id ) VALUES (1);
INSERT INTO tst_department ( company_id, department_id ) VALUES (1, 10);
INSERT INTO tst_asset ( asset_id, company_id, department_id ) VALUES (1001, 1, 10);  -- Department specific asset
INSERT INTO tst_asset ( asset_id, company_id, department_id ) VALUES (1002, 1, NULL);  -- Non-department specific asset

INSERT INTO tst_asset ( asset_id, company_id, department_id ) VALUES (1003, 2, NULL);  -- Bad company - fails
INSERT INTO tst_asset ( asset_id, company_id, department_id ) VALUES (1004, 1, 11);  -- Bad department - fails
INSERT INTO tst_asset ( asset_id, company_id, department_id ) VALUES (1005, 2, 11);  -- Bad company AND department - fails
Matthew McPeak
  • 17,705
  • 2
  • 27
  • 59
  • After reading @philipxy comment, it seems that Oracle cannot do what you describe, that is "match mode PARTIAL". It seems that PostgreSQL and MySQL do implement it (though I haven't verified it). In simple words, in the presence of any null, Oracle stores the FK without validating it. That is known as "match mode SIMPLE". Thanks for the insight. – The Impaler Sep 13 '18 at 19:10
  • Please check the example I posted and let me know what I am not understanding here. – Matthew McPeak Sep 13 '18 at 19:26
  • Yes, your example is good since you added the constraint `tst_asset_f1`. That allows you to validate `company_id` in the absence of `department_id`. If you removed this constraint Oracle would not be able to perform this validation natively (and inexistent values of `company_id` could be inserted): that would require the match mode "partial" that it doesn't implement. Good trick, nevertheless (obvious but I just didn't think about it). – The Impaler Sep 13 '18 at 19:44
1

Matthew's answer shows you how to work around this. As to why this happens, remember:

Constraints only reject rows where the condition is false.

Comparing anything to null => unknown. So null values pass constraints unless you specifically test for them. Which leads to the orphaned rows you demonstrate.

As the docs say:

If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key

In most cases this is a mistake or oversight by the original designers.

Chris Saxon
  • 9,105
  • 1
  • 26
  • 42
  • Thanks, after reading all possibilities here, I think it just was an oversight. For my specific case the simple solution is to make `area_id not null`. – The Impaler Sep 14 '18 at 15:47