Why dereferencing a null pointer is undefined behaviour?

Question

According to ISO C++, dereferencing a null pointer is undefined behaviour. My curiosity is, why? Why standard has decided to declare it undefined behaviour? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?

Believe it or not, address 0 is usable on the x86, so at times, you may actually need to dereference a "null" pointer. — Earlz, Jul 22 '11 at 16:45
@drb: [nasal demons](http://www.catb.org/jargon/html/N/nasal-demons.html) for instance... — Marcus Borkenhagen, Jul 22 '11 at 16:47
I'd say its the same reason that `free(0)` is guaranteed to do nothing: It allows you to use pointers to confer state information without a separate state variable. If you're on an x86 where you've malloced memory at 0, I guess you're out of luck. Check out the `unique_ptr` implementation of a move, all you need is `other.ptr = 0` and you're safe. — Kerrek SB, Jul 22 '11 at 16:50
some platforms allow poking in null, some not. so you can't reliably allow such practice. — fazo, Jul 22 '11 at 16:51
"according to C99 standard, as far as I know, [dereferencing NULL] is well defined". Can anyone confirm this is true? — Robᵩ, Jul 22 '11 at 16:59
@Rob - No, that is not true. C99 says that dereferencing invalid pointers is undefined. And then lists being null as one way to be invalid. — Bo Persson, Jul 22 '11 at 17:10
@Rob: it is not true. 6.5.3.2/4 says "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.", with a footnote that includes "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer". — Mike Seymour, Jul 22 '11 at 17:11
The null pointer doesn't necessarily refer to the address 0. — , Jul 22 '11 at 18:53
@Earlz I'm told there are platforms which use a nonzero bit pattern to represent NULL in order to make 0 a valid address. Of course, this complicates the conversion between pointers and integers (since `(uintptr_t)NULL` is guaranteed to be 0 and `(void*)0` is guaranteed to be a null pointer). Similarly, "all bits zero" (e.g. via `memset(dst,0,size)` and `calloc()`) will give you integer 0, but not necessarily floating 0 or a null pointer (or function pointer, which is different to a normal pointer); the standard is ambiguous about whether `_Bool` counts as an integer type or not. — tc., Nov 07 '12 at 19:24
@Earlz: I'd suggest that the quality of x86 coding would be better if x86 compilers had never allowed writing to a null pointer, but had offered intrinsics to read and write 8, 16, or 32-bit quantities from any specified segment/offset combination. A compiler could regard `__segofs_write32(0,0, (unsigned long)handler);` as a store to literal address zero without having to allow stores to null pointers. — supercat, May 28 '15 at 21:54
@MikeSeymour: why null pointer doesn't necessarily refer to the address 0? — Destructor, Aug 27 '15 at 16:27

Mark Ransom · Accepted Answer · 2019-03-09T22:07:32.187

44

Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.

edited Mar 09 '19 at 22:07

answered Jul 22 '11 at 16:53

Mark Ransom

299,747
42
398
622

1

You're assuming that `NULL` *has* to be special, and from what I understand, the OP's question is rather, *why* should it be special? – user541686 Jul 22 '11 at 17:01
2

@Mehrdad: How does it assume that `NULL` is special? It's no more special with respect to dereferencing than an uninitialized pointer or a pointer that no longer points to an existent object. – James McNellis Jul 22 '11 at 17:32
1

@James: Well because there's no reason the compiler *has* to check for `NULL` pointers (or other invalid pointers, for that matter)... it dereferences it like any other pointer if it wants to, if it's not special. Only if it were special would the compiler have to check. – user541686 Jul 22 '11 at 17:34
1

@Mehrdad, that was the point of my second paragraph - NULL pointers are *not* special and should not be. – Mark Ransom Jul 22 '11 at 17:50
1

@Mehrdad "_why should it be special?_" Because it **is** a special value. There is no other valid pointer value that is not either the address of an object or one-past-the-end some array. OTOH, dereferencing a null pointer is **not** a special case. – curiousguy Sep 30 '11 at 01:15

Jerry Coffin · Answer 2 · 2011-07-22T17:05:12.083

The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.

On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).

On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.

The camps wouldn't agree on what should happen, so they made it undefined behavior.

Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.

It's also worth noting that before C89 was published, it didn't impose any requirements with regard to *any* behaviors and yet many C implementations did define behaviors of many things. If some C compilers defined a behavior for some action, and some didn't, leaving the behavior undefined merely preserved the status quo. It's only recently that the Standard's failure to define things has been interpreted as an indication that no reasonable code--even code targeting platforms which defined the behavior before there *was* a C standard--should make use of anything not in the Standard. — supercat, Aug 12 '16 at 23:05

Mike Seymour · Answer 3 · 2011-07-22T17:12:51.640

11

The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.

C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.

Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.

edited Jul 22 '11 at 17:12

answered Jul 22 '11 at 16:56

Mike Seymour

249,747
28
448
644

4

That's not true. The defined behavior can simply be "you may dereference the null pointer as long as the value is not accessed. If the value of the resulting lvalue is accessed, behavior is undefined". This doesn't need any check. – Johannes Schaub - litb Jul 22 '11 at 17:21
1

@Johannes: Yes, you're right; I was interpreting "dereferencing" as "accessing the dereferenced value", which isn't strictly accurate. – Mike Seymour Jul 22 '11 at 17:25
@Johannes Schaub - litb: I posted excerpts from another answer of yours as an answer here, If you may want to add that as an answer of your own, please free to do so. I would delete the one marked community wiki if so. – Alok Save Jul 22 '11 at 17:30
@Als I don't do dupe posts. But I've upvoted yours. Thanks for spreading the words. Have fun :) – Johannes Schaub - litb Jul 22 '11 at 17:32
@Johannes Schaub - litb: Okay:) Anyways, I marked that community wiki while posting it! – Alok Save Jul 22 '11 at 17:36
@Johannes, the only way I can think of to dereference a pointer without accessing the value is to assign it to a reference. There might be some value to having a compiler in debug mode trapping attempts to create a reference from a NULL pointer, so leaving the behavior undefined has some benefits - compilers are free to go beyond the standard. – Mark Ransom Jul 22 '11 at 17:56

score 8 · Answer 4 · edited May 23 '17 at 11:45

This answer from @Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.

The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.

Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).

The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.

Note:
^{Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.}

IMHO, many issues could have been resolved by better defining the meanings of "C objects" and addresses, recognizing that an N-byte C object has N+1 associated addresses, the first N of which each *identify* one byte and the last N of which each *follow* one byte. This definition could generalize to zero-byte objects, which have a single address that neither identifies nor follows any byte of storage, and may or may not match the address of any other zero-byte object. — supercat, Apr 11 '19 at 20:58

score 5 · Answer 5 · answered Jul 22 '11 at 16:55

5

The real question is, what behavior would you expect ?

A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.

So how do you get a good reference... from a pointer that points into the void ?

You do not. Thus the undefined behavior.

answered Jul 22 '11 at 16:55

Matthieu M.

287,565
48
449
722

5

Throw an exception? Raise a signal? Call `abort()`? There are plenty of sensible things that *could* be defined; the question is, why leave it undefined? – Mike Seymour Jul 22 '11 at 17:20
@Mike Seymour: It seems that we did not interpreted the question similarly :) Checking (beforehand) the dereference would be costly. On the other hand, on Unix, the OS *is* performing the check anyway, so a signal handler could theorically be hooked up and perform one of the action you cite... but I do not think this is viable everywhere. Specifically on embedded platforms without OS. Specifying a behavior would cripple those platforms. – Matthieu M. Jul 23 '11 at 09:34
@MikeSeymour Throwing an exception where there is no `throw` is hardly a sensible thing to do. (Yes, you can draw conclusions about Java.) – curiousguy Sep 30 '11 at 01:21

score 1 · Answer 6 · answered Mar 19 '20 at 11:48

Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.

So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.

I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.

In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:

matthijs@grubby:~$ cat test.c
unsigned foo () {
        unsigned *foo = 0;
        return *foo;
}

matthijs@grubby:~$ arm-none-eabi-gcc  -c test.c -Os && objdump -d test.o 

test.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <foo>:
   0:   e3a03000        mov     r3, #0
   4:   e5933000        ldr     r3, [r3]
   8:   e7f000f0        udf     #0

This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).

This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.

So, not so much an answer, but some extra perspective on the matter.

score 1 · Answer 7 · answered Jul 22 '11 at 16:57

I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.

Thus to avoid any possible extra burden on compilers they left the behavior undefined.

score 1 · Answer 8 · answered Jul 22 '11 at 16:58

1

Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".

If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.

answered Jul 22 '11 at 16:58

user541686

205,094
128
528
886

score 1 · Answer 9 · answered Jul 22 '11 at 20:00

Here is a simple test & example:

Allocate a pointer:

int * pointer;

? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?

Marking the end of a linked list. In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?

The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.

score 0 · Answer 10 · answered May 28 '15 at 12:04

0

Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.

Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

There was even a discussion to make this behaviour part of the standard.

answered May 28 '15 at 12:04

Dmitry Grigoryev

3,156
1
25
53

There's no reason to expect the cast above to work in general, even if a null pointer was treated no differently from any other, since systems are not required to use any particular mapping between pointers and integers. A more interesting notion if there's a global `char* x;` somewhere that will never be modified would be better notion would be `((char*)&(((struct_type*)x)->member) - x)`. In all cases where the expression is defined, it will yield the (constant) offset of that member, and if the compiler can't tell if `x` holds a pointer to `struct_type` the most efficient way to... – supercat May 29 '15 at 17:20
...evaluate that expression would be to have it yield that constant directly without involving `x` at run-time. – supercat May 29 '15 at 17:22

score 0 · Answer 11 · answered Jul 22 '11 at 17:09

According to original C standard NULL can be any value - not necessarily zero.

The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function

There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.

(From http://c-faq.com/null/null1.html)

score -1 · Answer 12 · answered Jul 22 '11 at 16:53

-1

Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.

Mainly it is undefined because there is no logical way to handle it.

answered Jul 22 '11 at 16:53

Rocky Pulley

22,531
20
68
106

2

You certainly can create and (attempt to) dereference a null pointer in C++. – Mike Seymour Jul 22 '11 at 17:00
The point is you can't create a null reference, so how should it be defined when you try to use a back-door solution? Just undefine it. – Rocky Pulley Jul 22 '11 at 17:01
@RockyTriton Dereferencing a pointer yields a **lvalue**, not a reference. In C++ there is no expression that has reference type. – curiousguy Sep 30 '11 at 01:24
@MikeSeymour The question is about the result of `*nullpointer`. That would be a "null lvalue", which is certainly what Rocky meant. A "null lvalue" would be a lvalue at "null address". "null address" is a contradiction. – curiousguy Sep 30 '11 at 01:26

score -4 · Answer 13 · answered Jul 22 '11 at 21:45

-4

You can actually dereference a null pointer. Someone did it here: http://www.codeproject.com/KB/system/soviet_kernel_hack.aspx

answered Jul 22 '11 at 21:45

myeviltacos

113
5

Why dereferencing a null pointer is undefined behaviour?

13 Answers13

Linked

Related