2

I've just come across an interesting scenario on how NULL is handled in T-SQL (and possibly other forms of SQL). The issue is pretty well described and answered by this question and I've illustrated the issue below;

-- SET ANSI_NULLS ON -- Toggle this between ON/OFF to see how it changes behaviour
DECLARE @VAR1 DATETIME 
DECLARE @VAR2 DATETIME 

SET @VAR1 = (SELECT CURRENT_TIMESTAMP)
SET @VAR2 = (SELECT NULL)

-- This will return 1 when ansi_nulls is off and nothing when ansi_nulls is on
SELECT 1 WHERE @VAR1 != @VAR2

DECLARE @TstTable TABLE (
   COL1 DATETIME,
   COL2 DATETIME)

INSERT INTO @TstTable
SELECT @VAR1, @VAR1
UNION 
SELECT @VAR1, NULL

-- This won't ever return a value irrespective of the ansi_nulls setting
SELECT * FROM @TstTable WHERE COL1 != COL2

This situation led me to question my understanding of null representations specifically within SQL. I've always understood null to mean that it has no value. This seems to be an incorrect assumption given the first paragraph of this page. It states (my emphasis...I could quite easily just highlight the whole paragraph though);

A value of NULL indicates the value is unknown. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.

Does this hold true for T-SQL variable conditions also? It certainly does for my SELECT 1 WHERE @VAR1 != @VAR2 example above, but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing etc. I know ANSI_NULLS changes how this works, but it is deprecated and will be removed from some future version.

Can someone offer a good explanation as to why NULL in T-SQL refers to an unknown value rather than an uninitialised value? If so, can you extend your answer to show why T-SQL variables with a NULL value are also considered to be unknown?

Community
  • 1
  • 1
Mr Moose
  • 5,946
  • 7
  • 34
  • 69

3 Answers3

3

In SQL, we're interested in storing facts in tables (a.k.a relations).

What Codd asked for was:

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

What we've ended up with is three-valued logic (as @zmbq stated). Why is it this way?

We have two items that we're trying to compare for equality. Are they equal? Well, it turns out that we don't (yet) know what item 1 is, and we don't (yet) know what item 2 is (both are NULL). They might be equal. They might be unequal. It would be equally wrong to answer the equality comparison with either TRUE or FALSE. So we answer UNKNOWN.


In other languages, null is usually used with pointers (or references in languages without pointers, but notably not C++), to indicate that the pointer does not, at this time, point to anything.

Community
  • 1
  • 1
Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
2

Welcome to Three Valued Logic, where everything can be true, false or unknown.

The value of the null==null is not true, and it's not false, it's unknown...

zmbq
  • 38,013
  • 14
  • 101
  • 171
  • In standard SQL, the third logic value is properly referred to as `UNKNOWN`, not `NULL`. Some implementations (e.g. MySQL) conflate the two, others do not. – Damien_The_Unbeliever Mar 02 '12 at 07:47
  • Thanks for the link. It certainly describes the issue nicely. I think an example I read a long time ago gave an example of storing paper based form data in a database with some fields marked as required and others being multiple choice with a fixed set of answers. It translated nicely to the database with non nullable cols and check constraints etc. I always just assumed that nullable fields that were left as null were empty/nothing. Now TomTom's reference to arithmetic changes my thinking a bit 80 forms, Own a car? 20 said yes, 20 said no...40 UNKNOWN! Ding..another light. – Mr Moose Mar 02 '12 at 08:49
  • Unless of course you collapse the three valued logic into a closed or open world assumption. In a closed world true means that which we can guarantee to be true, and everything else is FALSE. So it's FALSE that null == null because we can't guarantee it. It's also false that null <> null, because we cannot guarantee that either. In an open world assumption everything that we cannot disprove is true. So there, null == null is true, and null <> null is also true. – Haakon Løtveit Mar 14 '18 at 13:27
1

but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing

?? What is there not to understand. It is like that BECAUSE IT WAS DEFINED LIKE THAT. Someone had the idea it is like that. It was put into the standard.

Yes, this is a little recursive, but quite often design decisions run like that.

This has more to do with arithmetics. Sum of 20 rows with one Null is Null - how would you treat it as unknown? C# etc. react with an exception, but that gets in your way when doing statistical analysis. Uknonwn values have tto move all they come in contact with into unknown, and no unknown is ever the same.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • What is not to understand?? I think the confusion comes from the use of ANSI_NULLS and how it applies to literal nulls and variable nulls (similar to how I'd expect things to work in application development), but it has no effect on comparison of column values in a result set. I knew that the unknown representation had to have something to do with the result of a query, and I just couldn't think of a situtation...until I saw your example of SUM with one NULL. Ding...a light when on in my head. Thanks. – Mr Moose Mar 02 '12 at 08:36