-1

null is the lack of value, or, more theatrically, it is the unkown. From here, it is perfectly logical, that null + a, null * a, null / a, etc. is resulting as null. This means that null is an absorbing element on these operations. I wonder why does it have to be an absorbing element on relations as well. null > 5 could be considered to be false as well, with an explanation at least as plausible as we can give for the current behavior. Currently we can say that null > 5 is null, since the unkown might be greater than 5, or not, so the result is the unkown. But if it was false, then we could say that null > 5 is false, since the lack of value is not greater than 5.

Take a look at these queries:

select *
from books
where author like 'Alex%'

This will return all the books, which have their author starting with Alex. Let us see the other books:

select * 
from books
where author not like 'Alex%'

This will return all the books where author does not start with Alex, right? Wrong! It will return all the books which have an author value which does not start with Alex. If we want to select the books whose author does not start with Alex, we have to explicitly include null values, like this:

select *
from books
where (author is null) or (author not like 'Alex%')

This seems to be an unnecessary complication to me which could be sorted out for future versions. But the question is: what is the explanation of this behavior? Why do we use null as the unkown instead of lack of value?

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • Add a non nullable constraint to the column, this is by design – Jeremy Thompson Jul 16 '16 at 13:17
  • @JeremyThompson, this question is not seeking for work-arounds. – Lajos Arpad Jul 16 '16 at 13:19
  • Take a look at this answer: http://stackoverflow.com/a/9581790/1073631 – sgeddes Jul 16 '16 at 13:20
  • 1
    By-Design is not a workaround. Product Engineers at any DB company are not going to change this nor will people in charge of the SQL spec. You know the issue, deal with it. – Jeremy Thompson Jul 16 '16 at 13:24
  • 1
    "You cannot compare something to anything which is not comparable, null is nothing" – Abdul Rehman Sayed Jul 16 '16 at 13:25
  • @sgeddes, that's great stuff, however, it states that null is "unkown" as my question described it as well. My question is: why is it feasible for SQL Server to associate the concept of unkown to null instead of lack of value. I have shown a weak point of this behavior and I wonder whether there are strong points. – Lajos Arpad Jul 16 '16 at 13:25
  • Like ask to compare Something with Something, not like include null value (not like = not contain + null value) – Esperento57 Jul 16 '16 at 13:28
  • @JeremyThompson, your proposal of making the column nullable is a work-around. I am sure you know it is not always possible. In my question I was wondering about the reason of this behavior and not about the way of dealing with it. There might be a very good reason for this, but in that case I am not aware of it. I see only the shortcomings. So it is either a good approach for a reason unkown to me, or it is an inferior approach compared to considering null to be lack of value instead of unkown. – Lajos Arpad Jul 16 '16 at 13:30
  • @JeremyThompson, sorry, non-nullable – Lajos Arpad Jul 16 '16 at 13:32
  • @AbdulRehmanSayed, if we cannot compare something to anything which is not comparable and null is not comparable, since it is nothing, then why don't we have a run-time error when we compare a value with null? – Lajos Arpad Jul 16 '16 at 13:34
  • Can you not fix this up with an UPDATE statement and setting a Default Column value. I didn't mean to sound harsh but this is a very common pet hate. Cheers and good luck! – Jeremy Thompson Jul 16 '16 at 13:37
  • 1
    The concept of null in its various forms has been debated for a long time. Go find books by Chris Date and Joe Celko for deeper discussions. You might also be interested in digging into three-value logic. – shawnt00 Jul 16 '16 at 15:50
  • @JeremyThompson, you do not sound harsh, but you seem to not understand my question. I do not want a solution, I can solve my problems. I am interested to know the reasons of the behavior. It seems to be an inferior behavior compared to the one I have described in the context of querying. So either the crew did not foresee the usage, or SQL standards were implemented or there is a good reason. I am interested about the third case, to know whether there are strong points in the approach. I have shown a weak point, but I am open to learning about the strong points. – Lajos Arpad Jul 16 '16 at 19:52
  • @shawnt00, thanks for the pointer. I will surely read those if I find the time to do so. – Lajos Arpad Jul 16 '16 at 19:53

2 Answers2

1

Why do we use null as the unknown instead of lack of value?

Part of the foundation of the Relational Model is predicate logic. While there are logics that have more than two values (true & false), the simplest and best defined, not to mention most familiar, is 2-valued: Boolean logic.

For reasons of industrial acceptance, into that fine mathematical model SQL introduced NULL. In Boolean logic we can prove the value of arbitrary expressions like NOT(A AND B), but there's no provision for missing values. Missing values are, quite simply, outside the domain of Boolean logic.

Having left academe behind, SQL makes arbitrary choices. What is the sum of N NULLs? NULL. What is count of N NULLs? 0. Is a value greater or lesser than NULL? To sort, has to be one or the other. Are two NULLs distinct, or identical, in GROUP BY? The SQL choices all "makes sense" at some level, even when implementations contradict each other. There's no right answer, because they're extra-logical.

So the answer to your question really is, because that's what the vendors chose. The unknown has no more meaning, logically, than lack of value. You could make an argument to treat NULL differently. It might win you a beer. If you want to see it manifested in a DBMS, though, you'll have to get it implemented.

This seems to be an unnecessary complication

You might be right, but you won't be surprised to learn that in 40 years many people have proposed your solution, namely X = NULL is false. The community settled on X = NULL is NULL, avoiding an implicit conversion. Considering how deeply nested and complicated SQL queries can be, that's probably a good thing.

CJ Date takes the position that NULL should be abolished, and all missing values should have a default value. I take exception to that for three reasons:

  1. Missingness is meaningful. If I record a default value for a missing one, I need another column (is_missing) to record its missingness.

  2. Default values can be captured in computations in error. Any use of a complementary is_missing column is ad hoc and outside the purview of the logic engine.

  3. The "right default" varies by context. Sometimes, the "previous" known value is sufficient (because, say, yesterday's price might stand for today's, absent better information). Sometimes there's a known proxy, like average lifespan. Sometimes it's zero, as in a covariance matrix. And sometimes there's no good default: the "value" should be excluded because it's missing.

I have a pet solution, too, that's both simple and strict. I would like to see an SQL option, say, SET STRICT_BOOLEAN ON that would treat missing values as errors for logical and computational purposes. You can insert a NULL; you can select one. You cannot compare one or add one or concatenate one. To do those things, you must supply a default (appropriate to your context) with COALESCE or similar. Any "undefaulted" use of NULL simply raises an error, just like divide by zero does. And for the same reason: like zero as a divisor, NULL in logic is outside the domain.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
0

I have not read the answer... But I believe that can help if you are using Oracle. Oracle implements the function LNNVL since Oracle 10 to deal with this.

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions078.htm

Leon
  • 198
  • 1
  • 7
  • The question is about SQL Server and the point of the question is to find out the cause of a certain behavior, not to work around it. – Lajos Arpad Jul 11 '17 at 09:50