149

Is this piece of code valid (and defined behavior)?

int &nullReference = *(int*)0;

Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...

Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:

if( & nullReference == 0 ) // null reference
Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
peoro
  • 25,562
  • 20
  • 98
  • 150
  • 1
    Can you give any case where this would actually be useful? In other words, is this just a theory question? – cdhowie Dec 06 '10 at 08:42
  • Well, are references ever indispensable? Pointers can always be used instead of them. Such a _null reference_ would let you use a reference also when you could have no object to refer to. Don't know how dirty it is, but before thinking of it I was interested about its legality. – peoro Dec 06 '10 at 08:53
  • 11
    I think it's [frowned upon](http://www.gotw.ca/conv/002.htm) – default Dec 06 '10 at 08:55
  • 30
    "we could check" - no, you can't. There are compilers that turn the statement into `if (false)`, eliminating the check, precisely because references can't be null anyway. A better documented version existed in the Linux kernel, where a very similar NULL check was optimized out: http://isc.sans.edu/diary.html?storyid=6820 – MSalters Dec 06 '10 at 08:56
  • 2
    "one of the major reasons to use a reference instead of a pointer is to free you from the burden of having to test to see if it refers to a valid object" this answer, in Default's link, sounds pretty good! – peoro Dec 06 '10 at 09:22
  • yes it's fine, but some persons are going to make suicide if you will use it :) – Kos Feb 10 '17 at 13:11

4 Answers4

94

References are not pointers.

8.3.2/1:

A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]

1.9/4:

Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer)

As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I removed my answer since I realized that the mere issue of dereferencing a null pointer and getting an lvalue that refers to that is a different thing than actually binding a reference to it, as you mention. Although lvalues are said to refer to objects or functions too (so in this point, there really isn't a difference to a reference binding), these two things still are separate concerns. For the mere act of dereferencing, here's the link: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1102 – Johannes Schaub - litb Dec 06 '10 at 09:03
  • 1
    @MSalters (reply to comment on deleted answer; relevant here) I can't particularly agree with the logic presented there. While it may be convenient to elide `&*p` as `p` universally, that doesn't rule out undefined behaviour (which by its nature may "seem to work"); and I disagree that a `typeid` expression which seeks to determine the type of a "dereferenced null pointer" actually dereferences the null pointer. I've seen people argue seriously that `&a[size_of_array]` can't and shouldn't be relied upon, and anyway it is easier and safe to just write `a + size_of_array`. – Karl Knechtel Dec 06 '10 at 09:08
  • @Default Standards in the [c++] tags should be high. My answer sounded like both acts were one and the same thing :) While dereferencing and getting an lvalue you don't pass around that refers to "no object" could be feasibly, storing it into a reference escapes that limited scope and suddenly could impact much more code. – Johannes Schaub - litb Dec 06 '10 at 09:09
  • @Karl well in C++, "dereferencing" doesn't mean to read a value. Some people think "dereference" means to actually access or modify the stored value, but that's not true. The logic is that C++ says that an lvalue refers to "an object or function". If that is so, then the question is what the lvalue `*p` refers to, when `p` is a null pointer. C++ currently does not have the notion of an empty lvalue, which the issue 232 wanted to introduce. – Johannes Schaub - litb Dec 06 '10 at 09:12
  • Detection of dereferenced null pointers in `typeid` works based on syntax, instead of based on semantics. That is, if you do `typeid(0, *(ostream*)0)` you *do* have undefined behavior - no `bad_typeid` is guaranteed to be thrown, even though you pass an lvalue resulting from a null pointer dereference semantically. But syntactically at the toplevel, it's not a dereference, but a comma operator expression. – Johannes Schaub - litb Dec 06 '10 at 09:15
  • @Karl: you may not realise it, but you're repeating issue 232 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232). Your "off the end" example can't be relied upon according to the standard - at least some of the authors want it to be reliable, although they don't seem to have done anything about it in C++0x. So I think you're basically right: it's undefined because the standard doesn't define it. It's interesting to note that some folks want to change that, though. – Steve Jessop Dec 06 '10 at 09:18
  • @Johannes I understood that; I didn't think it was relevant to my argument. But now I've considered the fact that `typeid` is a runtime operator and I see the problem... – Karl Knechtel Dec 06 '10 at 09:22
  • @Steve There seems to be confusion all around about who is discussing what the standard **does** mean versus what it **ought to** mean. :/ – Karl Knechtel Dec 06 '10 at 09:25
  • What if I bind it to a valid pointer and then free the pointer. Is this equivalent of a null reference or even is behaviour – Allahjane Feb 23 '17 at 17:36
  • @Allahjane: references and pointers to objects whose memory has been freed are dealt with elsewhere in the spec. I don't remember off-hand the section numbers that deal with it, but it's nothing to do with null pointers. – Steve Jessop Mar 07 '17 at 09:42
  • 1
    Just to confirm; this remains the case through C++14, 17, and 20, yeah? – Jason C Dec 19 '20 at 00:57
66

The answer depends on your view point:


If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.


However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).

Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:

int& toReference(int* pointer) {
    return *pointer;
}

When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code

#include "converter.h"

void foo() {
    int& nullRef = toReference(nullptr);
    cout << nullRef;    //crash happens here
}

the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.

You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).


So how about checking for null references? You gave the line

if( & nullReference == 0 ) // null reference

in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.


TL;DR:

Null references are somewhat of a ghastly existence:

Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).

I do hope that will not come to haunt you!

Community
  • 1
  • 1
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • What in the world do you mean by "you cannot create a null reference without dereferencing a null pointer"? The compiler does validate references by dereferencing them upon acquisition/initialization. Moreover, I have created a [code fragment](https://stackoverflow.com/a/64745536/11714860) that creates a null reference without dereferencing a null pointer. – Sapphire_Brick Jan 03 '21 at 06:39
  • 2
    @Sapphire_Brick Well, in your code example, you are not creating a null reference, you are creating an **uninitialized** reference: When you initialize the `union`, you are setting the pointer, not the reference. When you are using the reference in the next line, you are invoking undefined behavior by using the union member that has not been initialized. Of course, your compiler is free to give you a null reference in that case, and virtually all compilers will do that: The reference is just a pointer under the hood, and it shares its storage with a pointer you set to `nullptr`. – cmaster - reinstate monica Jan 03 '21 at 08:42
  • @cmaster-reinstatemonica Using a member of a union that isn't set is just reinterpreting the underlying memory, which can cause undefined behavior if the memory is invalid, and is used. Don't you agree that reading a `size_t` member of union where the `ssize_t` member is set does not create undefined behavior? Who cares if the reference field is uninitialized? The pointer field is initialized, and the reference uses the same memory. – Sapphire_Brick Jan 03 '21 at 18:12
  • 2
    @Sapphire_Brick That's how it was before strict aliasing rules came along. Now it's just as much undefined behavior as type punning a pointer. The compiler is free to schedule the read before the write. The only safe way for reinterpreting bits is a call to `memcpy()`. – cmaster - reinstate monica Jan 03 '21 at 23:57
  • @cmaster-reinstatemonica What about `volatile`? – Sapphire_Brick Jan 04 '21 at 18:15
  • 2
    @Sapphire_Brick `volatile` only forces the exact sequence and no omitted reads/writes on volatile variables, it does not provide any guarantees with respect to other variables. It's supposed to be used for memory mapped hardware registers, only. Implicit bit pattern conversion between `volatile` values via type punning or unions remains undefined behavior, afaik. – cmaster - reinstate monica Jan 04 '21 at 19:01
  • @cmaster-reinstatemonica You said "that's how it was before strict aliasing rules came along", but wouldn't that be backwards incompatible, since the compiler is _now_ free to schedule the read before the write? – Sapphire_Brick Jan 21 '21 at 18:29
  • 3
    @Sapphire_Brick Yes, that was the whole point of strict aliasing rules: To allow compilers optimizations they would not have been allowed to do by previous standards. Of course this broke existing code. The overheads of considering all memory accesses equal were visible throughout the entire C codebase, but the cases of pointer punning and `union` abuse were few and far between. Consequently, the positive impact of strict aliasing rules was deemed more important than the sporadic misbehavior of existing code. And that misbehavior could be fixed easily by adding some `memcpy()` calls. – cmaster - reinstate monica Jan 21 '21 at 18:53
11

clang++ 3.5 even warns on it:

/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
      always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
      ^~~~~~~~~~~~~    ~
1 warning generated.
Jan Kratochvil
  • 387
  • 3
  • 11
9

If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).

Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?

Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -

// Error.h
class Error {
public:
  static Error& NOT_FOUND;
  static Error& UNKNOWN;
  static Error& NONE; // singleton object that represents null

public:
  static vector<shared_ptr<Error>> _instances;
  static Error& NewInstance(const string& name, bool isNull = false);

private:
  bool _isNull;
  Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
  Error() {};
  Error(const Error& src) {};
  Error& operator=(const Error& src) {};

public:
  operator Error*() { return _isNull ? nullptr : this; }
};

// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
  shared_ptr<Error> pNewInst(new Error(name, isNull)).
  Error::_instances.push_back(pNewInst);
  return *pNewInst.get();
}

Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");

// Main.cpp
#include "Error.h"

Error& getError() {
  return Error::UNKNOWN;
}

// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
  return Error::NONE;
}

int main(void) {
  if(getError() != Error::NONE) {
    return EXIT_FAILURE;
  }

  // Edit: To see the overload of "Error*()" in Error.h actually working
  if(getErrorNone() != nullptr) {
    return EXIT_FAILURE;
  }
}
David Lee
  • 859
  • 1
  • 11
  • 13