What part of dereferencing NULL pointers causes undesired behavior?

Question

I am curious as to what part of the dereferencing a NULL ptr causes undesired behavior. Example:

//  #1
someObj * a;
a = NULL;
(*a).somefunc();   // crash, dereferenced a null ptr and called one of its function
                   // same as a->somefunc();

//  #2
someObj * b;
anotherObj * c;
b = NULL;
c->anotherfunc(*b);   // dereferenced the ptr, but didn't call one of it's functions

Here we see in #2 that I didn't actually try to access data or a function out of b, so would this still cause undesired behavior if *b just resolves to NULL and we're passing NULL into anotherfunc() ?

Isn't the 2nd example a bad one because anotherObj* c was not initialized? c very well could be NULL, so c->anotherfunc very well could dereference a NULL. Right? — Tom, Jul 10 '09 at 15:48
@Tom: If left uninitialized, c is likely to be something much less convenient than NULL. — Steve S, Jul 10 '09 at 16:59

score 6 · Answer 1 · answered Jul 10 '09 at 15:37

There is a concept, in the standard, of a null pointer value. This is a distinct value that causes undefined behavior when the program attempts to access memory through it. In practice, lots of modern implementations have it crash the program, which is useful behavior. After all, such an attempt is a mistake.

The name of the null pointer value is 0, or any other constant integral expression in pointer context (like 3 - 3, for example). There is also a NULL macro, which has to evaluate to 0 in C++ but can be (void *)0 in C (C++ insists more on pointers being type-safe). In C++0x, there will be an explicit value called nullptr, finally giving the null pointer an explicit name.

The value of the null pointer doesn't have to be an actual zero, although it is on all implementations I'm aware of, and the odd computers where that didn't work have mostly been retired.

You're misstating what happens in your last example. *b doesn't resolve into anything. Passing *b is undefined behavior, which means the implementation can do anything it likes with it. It may or may not be flagged as an error, and may or may not cause problems. The behavior can change for no apparent reason, and so doing this is a mistake.

If a called function is expecting a pointer value, passing it a null pointer value is perfectly legitimate, and the called function should handle it properly. Dereferencing a null pointer value is never legitimate.

The '*b' does resolve to somthing but it does not de-reference 'b'. According to the standard the 'unary operator *' when applied to a pointer returns an lvalue refering to the pointed at object (see below). — Martin York, Jul 10 '09 at 22:33
Except that trying to dereference the null pointer is undefined behavior, and so the implementation can do whatever it likes. There is no pointed-at object. Could you indicate where in the standard you're referring to? — David Thornley, Jul 13 '09 at 14:30

score 4 · Answer 2 · answered Jul 10 '09 at 15:47

Whether or not the mere fact of dereferencing a null pointer already results in undefined behavior is currently a gray zone in the Standard, unfortunately. What is certain is that reading a value out of the result of dereferencing a pointer is undefined behavior.

That it is undefined behavior is stated by various notes throughout the Standard. But notes are not normative: They could say anything, but they will never be able to state any rules. Their purpose is entirely informative.

That calling a member function on a null pointer formally is undefined behavior too.

The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.

Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).

The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.

Now, practically, you will not encounter a crash when merely dereferencing a null pointer. The problem of identifying an object or function for an lvalue seems to be entirely language theoretical. What is problematic is when you try to read a value out of the result of dereference. The following case will almost certainly crash, because it tries to read an integer from an address which is most probably not mapped by the affected process

int a = *(int*)0;

There are few cases where reading out of such an expression probably won't cause a crash. One is when you dereference an array pointer:

int *pa = *(int(*)[1])0;

Since reading from an array just returns its address using a element pointer type, this will most probably just make a null pointer (but as you dereference a null pointer before, this still is undefined behavior formally). Another case is dereferencing of function null pointers. Here too, reading a function lvalue just give you its address but using a function pointer type:

void(*pf)() = *(void(*)())0;

Aswell as the other cases, this is undefined behavior too, of course, but will probably not result in a crash.

Like the above cases, just calling a non-virtual member function on a null pointer isn't practically problematic either, most probably - even though it formally is undefined behavior. Calling the function will jump to the functions address, and don't need to read any data. As soon as you would try to read a nonstatic data-member, the same problem occurs as when reading out of a normal null pointer. Some people place an

assert(this != NULL);

In front of some member function bodies in case they accidentally called a function on a null pointer. This may be a good idea when there are often cases where such functions are mistakenly called on null pointers, to catch errors early. But from a formal point of view, this can never be a null pointer in a member function.

score 3 · Answer 3 · answered Jul 10 '09 at 15:22

3

The second example is also undefined behavior, yes. You are only allowed to call member functions on a valid object. And a null pointer does not point to a valid object.

The reason why it appears to work is that member functions are typically implemented roughly like this:

void anotherfunc(anotherObj* this, someObj& arg);

That is, the "this" pointer is basically passed to the function as a separate argument. So while calling the function, the compiler doesn't check that the this pointer is valid, it just passes it to the function.

It is still undefined behavior though. The compiler isn't guaranteed to let this work.

answered Jul 10 '09 at 15:22

jalf

243,077
51
345
550

1

True as long as anotherfunc is not virtual. If it is, the compiler needs the "this" pointer for the vtable in order to look up which override to call. – Nick Meyer Jul 10 '09 at 15:30
5.2.5/3 says `pointer->func();` is equivalent to `(*pointer).func();` in case pointer is really a pointer to a class type. But I actually found only two *comments* about dereferencing null pointers: first, in section 1.9 as an *example* of undefined behaviour, and secondly, in section 8.3.2 as a note that explains why null references can't exist. I would have expected it to find somewhere in section 5.2 [expr.post] and/or 5.3 [expr.ref]. Any ideas? – sellibitze Oct 26 '09 at 13:43
Well, it is undefined, they don't *have* to mention it explicitly. That's what "undefined means" ;) If the standard does *not* say "you're allowed to call member functions on a null pointer", then it is undefined. – jalf Oct 26 '09 at 14:08

Martin York · Answer 4 · 2009-07-10T16:28:51.190

That depends on the declaration of anotherfunc()

someObj * b;
anotherObj * c;
b = NULL;
c->anotherfunc(*b);

If anotherfunc() accepts a reference to b then you have not de-referenceed b, you have just converted it into a reference. If on the other hand it is a value parameter then a copy constructor will be invoked and then you have de-referenced it.

Weather it will crash will depend on many factors (like if it has members). But the act of de-referencing on a NULL is undefined so it has the option of working on your compiler.

As for the first option of calling a method on a NULL pointer.
This also is undefined behavior. Weather it crashes will depend on the compiler and OS. But it is perfectly valid to not crash (the behavior is undefined).

A lot of confusion is derived because people refer to the * in *b as de-reference operator. This may be its common name but in the standard it is the 'unary * operator' and it is defined as:

5.3.1

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

So the 'unary * operator' returns a reference to the object that was pointed at by the pointer it was applied to. (No de-referencing has happened at this point).

score 2 · Answer 5 · answered Jul 10 '09 at 15:15

Reading from or writing to the invalid memory location causes a crash.

A call to a member function through an invalid object pointer will usually succeed, if the method is not virtual and the method does not access any members of the object, since this involves no reads or writes related to the object pointer.

(This is not guaranteed by the standard, even though it work that way on all compilers i ever encountered)

score 2 · Answer 6 · answered Jul 10 '09 at 15:18

It would still cause a crash, but that's not necessarily undesired behaviour. Part of the usefulness of NULL is that, on most platforms, it points to memory that is explicitly inaccessible to your application, and causes a segmentation fault (or access violation) the very moment you try to dereference it.

Its purpose is to explicitly mark the contents of pointers as invalid.

score 2 · Answer 7 · answered Jul 10 '09 at 15:21

In practice, it doesn't crash until it needs to use the NULL value. This means that you can call non-virtual functions because they are bound at compile time. It calls the function just fine and passes in a NULL this pointer. Now if you try to use any member variables then it will crash because it will try to look them up based on the this pointer passed in. You can also call other non-virtual functions by the same argument. Now if you try to use a virtual function it will immediately crash because it tries to find the vtable from the NULL pointer.

We ran into a case like this and I had to write some example code to demonstrate to the other developers that even though it was reporting the error in 2 levels of calls to member functions it was actually a NULL pointer that was being called. The error was manifested when an actual value was used.

score 1 · Answer 8 · answered Jul 10 '09 at 15:43

In the early days, programmers were spending lot of time tracing down memory corruption bugs. One day a light bulb light up in some smart programmer's head. He said "What if I make it illegal to access the first page of memory and point all invalid pointers to it?" Once that happened, most memory corruption bugs were quickly found.

That's the history behind null pointer. I heard the story so many years ago, I can't recall any detail now, but I'm sure someone how's older...I mean wiser can tell us more about it.

score 0 · Answer 9 · answered Jul 10 '09 at 15:15

0

Dereferencing a NULL pointer is undefined behavior.

It is not guaranteed to crash, and you are not guaranteed anything when doing it. For all you know someone somewhere in the world will be punched each time you do it. That is valid behavior since it is undefined.

Also your pointers may not be initialized to NULL so if you want them to be for sure NULL you should set them explicitly to NULL.

answered Jul 10 '09 at 15:15

Brian R. Bondy

339,232
124
596
636

"That is valid behavior since it is undefined". Although note that compiler-writers are bound by applicable laws in their jurisdiction, in addition to the C++ standard ;-) – Steve Jessop Jul 10 '09 at 15:45
1

no they're not. It'd still be a valid C++ implementation even if it broke the law. ;) – jalf Jul 10 '09 at 16:32
I'm just glad all those times I dereferenced a NULL Pointer, it didn't decide to format my computer. – Brian R. Bondy Jul 10 '09 at 17:01
@Brian: personally I prefer "catch fire" as the canonical undefined behaviour. Don't know why, it just appeals. Or Dilbert's QuikProtect software - "if you have a sound card, it swears at you". – Steve Jessop Jul 10 '09 at 17:18
@jalf: yes, it would be valid behaviour from the POV of the C++ standard. I didn't mean to imply illegal => invalid. However, when considering the likelihood of a C++ compiler doing X (as Brian said, "for all you know"), we consider all rules which the compiler-writers attempt to follow. For all I know, the compiler breaks the law, but then again for all I know, the compiler breaks the standard. It may then not actually be a C++ implementation, but I don't know that, either... – Steve Jessop Jul 10 '09 at 17:19

score 0 · Answer 10 · answered Jul 10 '09 at 15:15

0

you need to know more about anotherfunc() to tell what will happen when you pass it null. it might be fine, it might crash, depends on the code.

answered Jul 10 '09 at 15:15

twolfe18

2,228
4
24
25

He doesn't actually pass NULL, he passes an object pointed to by NULL. – Michiel Buddingh Jul 10 '09 at 15:19
An object pointed to by null? Null doesn't point to anything. It is a special pointer value that means "points to nothing". – jalf Jul 10 '09 at 15:24
he does pass null. b has been declared but not initialized, so *b is null. – twolfe18 Jul 10 '09 at 15:25
Read the post again. b is initialized in the third line of the second snippet. *b is not a pointer, so it doesn't make sense to say that it is NULL either. – Nick Meyer Jul 10 '09 at 15:28

score 0 · Answer 11 · answered Jul 10 '09 at 15:21

It would still cause a crash because you're still instructing the compiler to attempt to access the memory at location 0 (which is forbidden). Depending on the signature of anotherfunc, you may be passing a reference (which are forbidden from being initialized with a NULL object), or a copy of *b.

score 0 · Answer 12 · answered Jul 10 '09 at 15:21

You are wandering in undefined territories.

You can think of calling a member function like calling a regular function with the additional, implicit this pointer argument. The function call itself is just putting the arguments in place according to call convention and jumping to a memory address.

So just calling a member function on a NULL object pointer does not necassarily cause a crash (unless it is a virtual function). You get invalid memory access crashes only when you try to access the object's member variables or vtable.

In case #2 you may or may not get an immediate crash, depending on how anotherfunc is declared. If it takes someObj by value, then you're indirecting NULL in the function call itself, resulting in a crash. If it takes someObj by reference, usually nothing happens since references are implemented using pointers under the hood and the actual indirection is postponed until you try to access member data.

GogaRieger · Answer 13 · 2009-07-10T15:28:41.640

0

Although in the standards dereferencing a zero pointer (NULL) is undefined behavior, current processors and operating systems generate a segmentation fault or similar error.

Maybe that function you called accepts a reference parameter (which IS a pointer) and that function doesn't use the paramenter, so the NULL won't be dereferenced.

edited Jul 10 '09 at 15:28

answered Jul 10 '09 at 15:21

GogaRieger

345
4
10

score 0 · Answer 14 · answered Jul 10 '09 at 15:56

I agree with Buck, in that in many cases it would be nice if calling a instance function on null resulted in null. However, I don't think that it should be the default. Instead there should be another operator (I'll leave what that is up to someone else, but let's say it's ->>).

One issue in C++, for instance, is that not all return types can be null anyway, such as int. So a call to a->>length() would be difficult to know what to return when a itself was null.

Other languages where everything is a reference type, you would not have this problem.

Finally, Buck, what everyone else is saying is the way things are, especially for the C++ language: Dereferencing is a mechanical operation in most languages: It must return something of the same type and null is typically stored as zero. Older systems would just crash when you tried to resolve zero, newer ones would recognize the special nature of the value when the error occured.

Also, these lower level languages cannot represent null as an integer (or other basic data types), so you could not in general deference null as null in all cases.

BuckFilledPlatypus · Answer 15 · 2009-07-10T17:10:20.577

Tom's comment is correct, I did not initialize correctly therefore the question is ambiguous at best yet most everyone directly answered my question, I unwittingly submitted the question while not logged in (sorry I'm new to stackoverflow) so can someone with editing powers update the OP?

//  #2
someObj * b;
anotherObj * c = new anotherObj();        //initialize c
b = NULL;
c->anotherfunc(*b);   // *b is in question not the c dereference

score -1 · Answer 16 · answered Jul 10 '09 at 15:16

-1

NULL is just 0. Since 0 doesn't point to a real memory address, you can't dereference it. *b can't just resolve to NULL, since NULL is something that applies to pointers, not objects.

answered Jul 10 '09 at 15:16

Sean

4,450
25
22

2

0 is real memory address. For good reasons, it has been made illegal to reference it in most platforms. Well, technically referencing the first page of the memory is generally illegal in those platforms. – Shing Yip Jul 10 '09 at 20:23

What part of dereferencing NULL pointers causes undesired behavior?

16 Answers16

5.3.1

Linked