4

It may sound like a silly question, but since in C, NULL is literally defined as

#define NULL 0

why can't it be a valid memory address? Why can't I dereference it, and why would it be impossible for any data to be at the memory address 0?

I'm sure the answer to this is something like the "the first n bytes of memory are always reserved by the kernel", or something like that, but I can't find anything like this on the internet.

Another part of my reasoning is that, wouldn't this be platform independent? Couldn't I invent a new architecture where the memory address 0 is accessible to processes?

Jacob Garby
  • 773
  • 7
  • 22
  • 5
    The definition of `NULL` *is* implementation defined: https://en.cppreference.com/w/c/types/NULL – UnholySheep Jul 07 '18 at 11:57
  • @MartinR My question is not a duplicate of that. I'm asking _why 0 is not a valid memory address_, I'm not asking if dereferencing NULL is defined. – Jacob Garby Jul 07 '18 at 12:03
  • 1
    @JacobGarby Well, in the title you're asking why NULL is not a valid address. Not if 0 is, and NULL does not have to be defined as 0. – klutt Jul 07 '18 at 12:14
  • the address 0 is valid on many architectures, mainly Harvard architectures. On others like x86 it might just be mapped somehow to trigger a segfault when dereferencing, but technically the kernel can still access it[ – phuclv Jul 07 '18 at 12:38

3 Answers3

5

Dereferencing NULL is undefined behavior. Anything could happen, and most of the time bad things happen. So be scared.

Some old architectures (VAX ...) permitted you to derefence NULL.

The C11 standard specification (read n1570) does not require the NULL pointer to be all zero bits ( see C FAQ Q5.17); it could be something else, but it should be an address which is never valid so is not obtainable by a successful malloc or by the address-of operator (unary &), in the sense of C11. But it is more convenient to have it so, and in practice most (but not all) C implementations do so.

IIRC, on Linux, you might mmap(2) the page containing (void*)0 with MAP_FIXED, but it is not wise to do so (e.g. because a conforming optimizing compiler is allowed to optimize dereference of NULL).

So (void*)0 is not a valid address in practice (on common processors with some MMU and virtual memory running a good enough operating system!), because it is convenient to decide that it is NULL, and it is convenient to be sure that derefencing it gives a segmentation fault. But that is not required by the C standard (and would be false on cheap microcontrollers today).

A C implementation has to provide some way to represent the NULL pointer (and guarantee that it is never the address of some valid location). That might even be done by a convention: e.g. provide a full 232 bytes address space, but promise to never use address 0 (or whatever address you assigned for NULL, perhaps 42!)

When NULL happens to be derefencable, subtile bugs are not caught by a segmentation fault (so C programs are harder to debug).

Couldn't I invent a new architecture where the memory address 0 is accessible to processes?

You could, but you don't want to do that (if you care about providing any standard conforming C implementation). You prefer to make address 0 be the NULL. Doing otherwise make harder to write C compilers (and standard C libraries). And make that address invalid to the point of giving a segmentation fault when derefencing make debugging (and the life of your users coding in C) easier.

If you dream of weird architectures, read about Lisp machines (and Rekursiv, and iapx 432) and see The circuit less traveled talk at FOSDEM2018 by Liam Proven. It really is instructive, and it is a nice talk.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 2
    0 is an absolutely valid memory address. It is the first element of addressable memory. However, many implementations _define_ 0 to be an invalid memory address for all practical use by applications and often have the memory management hardware trigger an exception to signal the application/developer there is something wrong. – Paul Ogilvie Jul 07 '18 at 13:13
  • To make my point, dereferencing 0 is _not a priori_ undefined behavior. It is only so because the architectue/memory management defines it as illegal (causing a seg fault, which is pretty well _defined_ behavior). – Paul Ogilvie Jul 07 '18 at 13:15
  • Did you mean C notion of `0` (as a pointer, it is the `NULL`), or the 0 address ? – Basile Starynkevitch Jul 07 '18 at 15:52
  • I mean the zero address. If an application points to the zero address (a zero pointer, or often the `NULL` pointer as in `#define NULL 0`) it generally points to the interrupt vector of interrupt 0 (X86). Since applications do not and cannot (should not) fiddle with interrupt vectors, the implementation defines it as illegal. – Paul Ogilvie Jul 08 '18 at 15:18
2

Making address zero unmapped so that a trap occurs if your program tries to access it is a convenience provided by many operating systems. It is not required by the C standard.

According to the C standard:

  • NULL is not be the address of any object or function. (Specifically, it requires that NULL compare unequal to a pointer to of any object or function.)
  • If you do apply * to NULL, the resulting behavior is not defined by the standard.

What this means for you is that you can use NULL as an indicator that a pointer is not pointing to any object or function. That is the only purpose the C standard provides for NULL—to use is tests such as if (p != NULL)…. The C standard does not guarantee that if you use *p when p is NULL that a trap will occur.

In other words, the C standard does not require NULL to provide any trapping capability. It is just a value that is different from any actual pointer, provided just so you have one pointer value that means “not pointing to anything.”

General-purpose operating systems typically arrange for the memory at address zero to be unmapped (and their C implementations define NULL to be (void *) 0 or something similar) specifically so that a trap will occur if you dereference a null pointer. When they do this, they are extended the C language beyond what the specification requires. They deliberately exclude address zero from the memory map of your process to make these traps work.

However, the C standard does not require this. A C implementation is free to leave the memory at address zero mapped, and, when you apply * to a null pointer, there might be data there, and your program could read and/or write that data, if the operating system has allowed it. When this is done, it is most often in code intended to run inside the operating system kernel (such as device drivers, kernel extensions, or the kernel itself) or embedded systems or other special-purpose systems with simple operating systems.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

The null pointer constant (NULL) is 0-valued. The null pointer value may be something other than 0. During translation, the compiler will replace occurrences of the null pointer constant with the actual null pointer value.

NULL does not represent “address 0”; rather, it represents a well-defined invalid pointer value that is guaranteed not to point to any object or function, and attempts to dereference invalid pointers lead to undefined behavior.

John Bode
  • 119,563
  • 19
  • 122
  • 198