23

I once read that having nullable types is an absolute evil. I believe it was in an article written by the very person who created them(in Ada?) I believe this is the article

Anyway, so what if by default a language like C# used non-nullable types? How would you replace some of the common idioms in C# or Ruby or any other common language where null is an acceptable value?

Earlz
  • 62,085
  • 98
  • 303
  • 499
  • 3
    You have this problem with primitives anyway don't you? If you have `double x;` How can you tell if `x` was initialized or not? – KLee1 Aug 03 '10 at 03:07
  • @KLee yes I know, but ignore C# specifics. The reason I'm asking this question is because I'm designing my own language and am taking the non-nullable debate into consideration. – Earlz Aug 03 '10 at 03:09
  • That wasn't specific to C# as far as I know. `double` is a non-nullable type in most languages I think. You would have to deal with objects not being null the same way you would deal with deciding whether a `double` was initialized. – KLee1 Aug 03 '10 at 03:12
  • 6
    [Tony Hoare: Null References, The Billion Dollar Mistake](http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake). There is also a [video of the presentation](http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare). (And [interviews](http://www.simple-talk.com/opinion/geek-of-the-week/sir-tony-hoare-geek-of-the-week/) and such about this topic.) – sth Aug 03 '10 at 03:14
  • 4
    Maybe `int` is a better example than `double`, though. The usual IEEE floating point implementation of `double` offers `NaN` values, which can often be a reasonable substitute for a `null` value. – bcat Aug 03 '10 at 03:16
  • @bcat : You can still operate on a NaN and get useless results, you can't on a null. – bltxd Oct 21 '10 at 15:49
  • @bltxd: True, though that can be mitigated to some extent if your language supports signaling NaNs. – bcat Oct 22 '10 at 20:41
  • possible duplicate of [Implications of not including NULL in a language?](http://stackoverflow.com/questions/1442463/implications-of-not-including-null-in-a-language), [about-the-non-nullable-types-debate](http://stackoverflow.com/questions/641328/about-the-non-nullable-types-debate) – nawfal Jul 08 '14 at 10:36
  • `NaN` is not `null` ... it means you have a value, but it doesn't make sense in the current context; `null` means you literally have nothing. – robert Sep 06 '14 at 12:08

11 Answers11

27

Instead of outright declaring that nullable types are evil, I would posit: most languages graft nullability onto entire kinds of types, when the two concepts should really be orthogonal.

For example, all non-primitive Java types (and all C# reference types) are nullable. Why? We can go back & forth, but ultimately I'll bet the answer comes down to "it was easy". There's nothing intrinsic to the Java language that demands widespread nullability. C++ references offered a fine example of how to exorcise nulls at the compiler level. Of course, C++ has a lot more ugly syntax that Java was explicitly trying to curtail, so some good features ended up on the cutting floor alongside the bad.

Nullable value types in C# 2.0 offered a step in the right direction -- decoupling nullability from unrelated type semantics, or worse, CLR implementation details -- but it's still missing a way to do the opposite with reference types. (Code contracts are great & all, but they're not embedded in the type system the way we're discussing here.)

Plenty of functional or otherwise obscure languages got these concepts "straight" from the beginning...but if they were in widespread use, we wouldn't be having this discussion...

To answer your question: banning nulls from a modern language, wholesale, would be just as foolish as the so-called "billion dollar mistake." There are valid programming constructs where nulls are nice to have: optional parameters, any sort of default/fallback calculation where the coalesce operator leads to concise code, interaction with relational databases, etc. Forcing yourself to use sentinel values, NaN, etc would be a "cure" far worse than the disease.

That said, I'll tentatively agree with the sentiment expressed in the quote, so long as I may elaborate to fit my own experience:

  1. the # of situations where nulls are desirable is smaller than most people think
  2. once you introduce nulls into a library or codepath, it's much harder to get rid of them than it was to add them. (so don't let junior programmers do it on a whim!)
  3. nullable bugs scale with variable lifetime
  4. correlary to #3: crash early
Richard Berg
  • 20,629
  • 2
  • 66
  • 86
  • 3
    In languages that don't have a nice way to define optional parameters then you may (or you may not, depends on how creative you want to be) need NULLs for it. But in general, NULLs are not necessary for implementing optional parameters. – slebetman Aug 03 '10 at 03:50
  • True. Just because nulls can be used to implement optional parameters doesn't make them the *best* way -- especially if we start talking about specific languages which may have their own custom syntax sugar. – Richard Berg Aug 03 '10 at 03:56
  • 1
    I hear you on the Nullable value types in C# thing. Nullable is a little bit clunky to use, but that's just a syntax issue. (I would rather access the value of `var` with just `var` itself, not `var.Value`.) Creating something to go the other way -- a type which is reference/nullable by default but can be optionally transformed into value/non-nullable -- would definitely require more than a generic class and syntactic sugar. – Brian S Aug 03 '10 at 04:25
  • There are two ways one could make types be non-nullable: (1) allow a particular fixed default value; (2) require any creation of an object or array slot holding a field of that type result in a call to a type-specific constructor which must complete before the containing object is exposed anywhere. The first choice could be useful in some contexts, but in many cases the most sensible default value is a trap representation. The second choice could sometimes be useful, but would in many slow down the common case (where every slot in the array would be rewritten before it's read). – supercat May 30 '12 at 15:30
24

We'd use option types for the (very) few places where allowing a null value is actually desirable, and we'd have a lot less obscure bugs since any object reference would be guaranteed to point to a valid instance of the appropriate type.

bcat
  • 8,833
  • 3
  • 35
  • 41
  • How is an option type conceptually different from a nullable type? – Gabe Aug 03 '10 at 03:27
  • 2
    It's not. It's just that the default is _not_ to permit null, and you explicitly signal that null is permissible on those occasions where it is desirable. – Richard Wolf Aug 03 '10 at 03:29
  • 6
    @Gabe An option type is a nullable type. Languages that have distinct types, however, usually have the property that types are non-nullable. Some types (such as float or list) may have a type of null built in to their domain. The option type gives us a way to introduce nullability to other types where it is needed. The problem isn't the existence of nullability; it is the omnipresence of it. If you build your language non-nullable, option types let you re-introduce nullability where appropriate. – Michael Ekstrand Aug 03 '10 at 03:31
  • [Edit: ninja'd :)] It really isn't, as far as I know, except that option types explicitly indicate that a value can be null/unset, while nullable types as implemented in C#, Java, etc. can be null/unset by default, without any explicit declaration. – bcat Aug 03 '10 at 03:31
  • bcat: In C#, value types can either be nullable or not (e.g. `int` or `int?`), but reference types are always nullable. – Gabe Aug 03 '10 at 05:27
  • 5
    @Gabe An option type is different from a nullable type in that the compiler won't let you use the value without first checking for null. In F#, it looks like, `match opt with | Some(value) -> do_something_with value | None -> oops_its_null`. If you tried to do `do_something_with opt`, you would get a compile-time type error. – Jason Orendorff Aug 04 '10 at 22:08
  • 4
    This is the correct answer. It's a sad commentary on the SO voting population that it hasn't gotten more upvotes. – Jason Orendorff Aug 05 '10 at 14:09
  • 3
    @Gabe: there are some differences between option types and nullable types. One is that you can have an option option (i.e. a type whose values are None, Some(None) and Some(x)), which is especially important in languages with parametric polymorphism. Another difference is that it's one less concept to learn (for the programmer) or implement (for the implementer): option is just one of the many datastructures (0 or 1 x), like list (any number of x in order), array (a certain fixed number of x), pair (an x and a y), … – Gilles 'SO- stop being evil' Aug 05 '10 at 20:19
7

Haskell is a powerful language that doesn't have the concept of nullity. Basically, every variable must be initialized to a non-null value. If you want to represent an "optional" variable (the variable may have a value but it may not), you can use a special "Maybe" type.

It's easier to implement this system in Haskell than C# because data is immutable in Haskell so it doesn't really make sense to have a null reference that you later populate. However, in C#, the last link in a linked list may have a null pointer to the next link, which is populated when the list expands. I don't know what a procedural language without null types would look like.

Also, note that many people above seem to be suggesting replacing nulls with type-specific logical "nothing" values (999-999-9999, "NULL", etc.). These values don't really solve anything because the problem people have with nulls is that they are a special case but people forget to code for the special case. With the type-specific logical nothing values, people STILL forget to code for the special case, yet they avoid errors that catch this mistake, which is a bad thing.

gdj
  • 1,295
  • 10
  • 10
4

I think you are referring to this talk: "Null References: The billion dollar mistake"

Darrel Miller
  • 139,164
  • 32
  • 194
  • 243
4

You can adopt a simple rule: All variables are initialized (as a default, this can be overridden) to a immutable value, defined by the variable's class. For scalars, this would usually be some form of zero. For references, each class would define what its "null" value is, and references would be initialized with a pointer to this value.

This would be effectively a language-wide implementation of the NullObject pattern: http://en.wikipedia.org/wiki/Null_Object_pattern So it doesn't really get rid of null objects, it just keeps them from being special cases that must be handled as such.

ergosys
  • 47,835
  • 5
  • 49
  • 70
  • 7
    No, they are still special cases that must be handled as such. You just end up eith errors that are harder to debug because they are silently ignored instead of immediately raising an exception. – Gabe Aug 03 '10 at 03:48
  • If(foo == 0) is not any more elegant than If(foo == null). In many cases, it's *more problematic* than allowing nulls in the first place. I know you're just answering the question, not defending Hoare's position per se, but I couldn't stop myself from commenting... – Richard Berg Aug 03 '10 at 03:59
  • @Gabe, if the null value is intended to mean something other than "do nothing", then a test for the null value would be needed. The places for these tests I think would come out of testing and design, but I've never used a language which worked this way (hey, maybe there's a reason!), so I don't know how well it would work in practice. – ergosys Aug 03 '10 at 04:11
  • @Richard, I'm not seeing your point. The idea would be that references would be initialized to point to a null object that would essentially do nothing for the operations defined for it, avoiding the need to check in most cases that indeed that reference does point to the null value object. You would still be able to differentiate null object references from non-null object references by comparing references, e.g., if myclassobject == myclass.null, but you wouldn't need to in most cases. – ergosys Aug 03 '10 at 04:24
  • 1
    I understand the pattern. And I agree it's sometimes the only way, if your language is particularly unexpressive. I just think it's going to be worse. At least a nullref will crash early and provide a stack trace. An improper MyClass.Null could go undetected indefinitely. There might be numerically fewer of them, but debugging sounds far more insidious. – Richard Berg Aug 03 '10 at 04:49
  • When `null` is a sentinel value, the compiler can tell me every time I need to check for it or insert its own check that throws an exception. With the null object pattern, I just have to guess where I need to put in the checks, and history has shown that programmers are bad at putting in such checks. – Gabe Aug 03 '10 at 05:21
  • There's a difference between an `int` that hasn't been properly initialized and one that has a value of zero. Moreover, no value will work properly. We'd ideally like + 3 == , but in reality it will have a different value, and we will have lost the initialization information. (This does work for IEEE floating-point types, assuming you can initialize them to some form of NaN.) In other words, using this scheme you have to explicitly test an `int` any time it might possibly be uninitialized, and I don't see how that's an improvement over anything. – David Thornley Aug 03 '10 at 15:47
  • @David, from the programmer's point of view, there are no uninitialized variables with this scheme. There are some languages that guarantee scalars are by default initialized to zero, and it works reasonably well for all scalar types, including floats. I don't know any that initialize references to an actual object, which would be the difference here. – ergosys Aug 03 '10 at 16:20
  • @ergosys: However, "initialized" doesn't mean a whole lot in such an environment. A variable initialized to an arbitrary value isn't any more useful than one that's garbage because it's uninitialized - not unless you can and do test for initialization. I'd rather stick with compiler warnings about uninitialized variables. – David Thornley Aug 03 '10 at 17:20
3

Null is not the problem, it is the language allowing you to write code that accesses values that can possibly be null.

If the language would simply require any pointer access to be checked or converted to a non-nullable type first, 99% of null related bugs would go away. E.g. in C++

void fun(foo *f)
{
    f->x;                  // error: possibly null
    if (f)              
    {
        f->x;              // ok
        foo &r = *f;       // ok, convert to non-nullable type
        if (...) f = bar;  // possibly null again
        f->x;              // error
        r.x;               // ok
    }
}

Sadly, this can't be retrofitted to most languages, as it would break a lot of code, but would be quite reasonable for a new language.

Aardappel
  • 5,559
  • 1
  • 19
  • 22
2

Tcl is one language that not only does not have the concept of null but where the concept of null itself is at odds with the core of the language. In tcl we say: 'everything is a string'. What it really means is tcl has a strict value semantics (which just happens to default to strings).

So what do tcl programmers use to represent "no-data"? Mostly it's the empty string. In some cases where the empty string can represent data then its typically one of:

  1. Use empty string anyway - the majority of the time it makes no difference to the end user.

  2. Use a value you know won't exist in the data stream - for example the string "_NULL_" or the number 9999999 or my favourite the NUL byte "\0".

  3. Use a data structure wrapped around the value - the simplest is a list (what other languages call arrays). A list of one element means the value exist, zero element means null.

  4. Test for the existence of the variable - [info exists variable_name].

It is interesting to note that Tcl is not the only language with strict value semantics. C also has strict value semantics but the default semantics of values just happen to be integers rather than strings.

Oh, almost forgot another one:

Some libraries use a variation of number 2 that allows the user to specify what the placeholder for "no data" is. Basically it's allowing you to specify a default value (and if you don't the default value usually defaults to an empty string).

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • Yes, so you, in Tcl, basically have something that quite resembles null, but that ends up being worse. Using magic values is WAY worse than using nulls. – devoured elysium May 09 '11 at 07:28
1

We'd create all kinds of strange constructs to convey the message of an object 'being invalid' or 'not being there', as seen in the other answers. A message that null can convey very well.

  • The Null Object pattern has its disadvantages, as I explained here.
  • Domain-specific nulls. This forces you to check for magic numbers, which is bad.
  • Collection wrappers, where an empty collection means 'no value'. Nullable wrappers would be better, but that doesn't differ much from checking for null or using the Null Object pattern.

Personally, I would write some C# preprocessor that allows me to use null. This would then map to some dynamic object, which throws a NullReferenceException whenever a method is invoked on it.

Back in 1965, null references may have looked like a mistake. But nowadays, with all kinds of code analysis tools that warn us about null references, we don't have to worry that much. From a programming perspective null is a very valuable keyword.

Community
  • 1
  • 1
Niels van der Rest
  • 31,664
  • 16
  • 80
  • 86
  • Parametric option types are domain-specific, easily checkable both by compilers and readers. Null has no legitimate uses besides low-level implementations. – bltxd Oct 21 '10 at 15:46
1

Realistically speaking, in any powerful programming language that allows pointers or object references in the first place, there are going to be situations where code will be able to access pointers which have not had any initialization code run upon them. It may be possible to guarantee that such pointers will be initialized to some static value, but that doesn't seem terribly useful. If a machine has a general means of trapping accesses to uninitialized variables (be they pointers or something else), that's better than special-casing null pointers, but otherwise the biggest null-related mistakes I see occur in implementations that allow arithmetic with null pointers. Adding 5 to a (char*)0 shouldn't yield a character pointer to address 5; it should trigger an error (if it's appropriate to create pointers to absolute addresses, there should be some other means of doing it).

supercat
  • 77,689
  • 9
  • 166
  • 211
1

What would we do without NULL? Invent it! :-) You don't have to be a rocket scientist to use 0 if you are looking for an inband pointer value to express actually not a pointer.

Jens
  • 69,818
  • 15
  • 125
  • 179
0

We use either

  1. Discriminators. An extra attribute or flag or indicator that says that a value is "null" and must be ignored.

  2. Domain-Specific Nulls. A specific value -- within the allowed domain -- that is interpreted as "ignore this value". For example, a social security number of 999-99-9999 could be a domain-specific null value that says the SSN is either unknown or not applicable.

S.Lott
  • 384,516
  • 81
  • 508
  • 779