8

Possible Duplicate:
Is NULL always zero in C?

The C standard states the following for calloc():

The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero.

with the following caveat relating to all bits zero:

Note that this need not be the same as the representation of floating-point zero or a null pointer constant.

Test program:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    char** list = calloc(10, sizeof(*list));
    int i;
    for (i = 0; i < 10; i++)
    {
        printf("%p is-null=%d\n", list[i], NULL == list[i]);
    }
    free(list);

    return 0;
}

I built and executed this program with the following compilers:

  • VC7, VC8, VC9, VC10
  • gcc v4.1.2, gcc v4.3.4
  • Forte 5.8, Forte 5.10

In all cases all bits zero is a NULL pointer (unless I made a mistake in the test program).

What is the reason a NULL pointer is not guaranteed by the C standard to be all bits zero ? Out of curiousity, are there any compilers where all bits zero is not a NULL pointer ?

Community
  • 1
  • 1
hmjd
  • 120,187
  • 20
  • 207
  • 252
  • 1
    [This](http://stackoverflow.com/questions/5142251/redefining-null) may be of interest. – Lundin Nov 06 '12 at 12:59
  • Why be overly specific in standardization if there is no need to be and no obvious benefit arises? – pmr Nov 06 '12 at 12:59
  • 1
    @ecatmur While the linked question only asks "if", this question asks "why". I think this one is much better if one removes the `calloc` and compiler cruft. – pmr Nov 06 '12 at 13:02
  • There may be some such exotic systems; the better question is whether there are any *relevant* systems. I would say there's an overwhelming likelihood the answer is no. For what it's worth, there are a number of things the C standard allows (like non-twos-complement integers) that have *never* appeared in an implementation, and non-all-zero-bits null pointers are just about on this level of badness. – R.. GitHub STOP HELPING ICE Nov 06 '12 at 13:10
  • @R.. Actually, one's complement commonly exists in the niche of electronic scales systems. I suspect that many implementations of such systems would have used one's complement internally, though whether they use mainstream MCUs (with 2's compl.) or some custom solution, I have no idea. – Lundin Nov 06 '12 at 13:43
  • Even if the hardware is ones-complement-capable, there's no reason for the C implementation to use it. Having unsigned arithmetic in hardware (which C needs anyway) is sufficient to give twos complement, which is preferable in all ways. Likewise, the all-zero-bits pointer is always a sufficient implementation for the null pointer, regardless of hardware. There's no requirement that null pointer dereferences cause a trap/signal, so there's no need to use a special address that traps. – R.. GitHub STOP HELPING ICE Nov 06 '12 at 18:41
  • @R..: An implementation may have library routines in ROM that represent null using something other than all-bits-zero but otherwise use physical addresses and could potentially use all-bits-zero as a meaningful address. It would be possible for a C compiler to still use all-bits-zero to represent a null pointer if each pointer was stored as the difference between the desired pointer and the "official" null value. This would have some cost, but could simplify--sometimes greatly--some initialization code. Personally, I think there should be a standard dialect of C which... – supercat Apr 28 '15 at 15:58
  • ...mandates some common assumptions such as CHAR_BITS==8, two's-complement math, no trap representations for integers, and--absent directives otherwise--signed overflow yields Partially-Indeterminate value (which could arbitrarily behave as a number outside the type's range), left-shifts of negative numbers behave like repeated multiplication (same overflow consequences), oversized shifts arbitrarily behave as `n` shifts or `n % SIZE` shifts, etc. And of course all-bits-zero being defined as a default value for pointers and floats as well as integers. – supercat Apr 28 '15 at 16:07
  • @supercat: Some of that is too strict. Especially indeterminate overflow; that might as well be determinate modular overflow. The benefit of UB rather than partially-indeterminate value is that it can be *observed differently* in different places and thereby allows a lot of optimization. If you have to ensure a consistent but unspecified result is seen everywhere, you might as well dictate unsigned-like behavior. – R.. GitHub STOP HELPING ICE Apr 28 '15 at 16:10
  • I would much rather see such a dialect impose requirements on representations (like twos complement, no padding, octet bytes, int at least 32-bit, ...) which are easy to provide almost anywhere and are only gratuitous generality in the C standard right now, than try to impose limitations on optimization. POSIX is probably the right place to start that process; there's an open issue calling for a requirement that all-zero-bits be interpreted as a null pointer, an POSIX already has some of the other requirements like `CHAR_BIT==8`. – R.. GitHub STOP HELPING ICE Apr 28 '15 at 16:11
  • @R..: "Clean" modular overflow is not free, nor even is Unspecified Value. Partially-Indeterminate Value would among other things mean that a 64-bit system with a 32-bit `int` could perform computations in 64-bit register without being obliged to either trim the results to 32 bits nor keep excess precision when spilling registers, but if there was a rule that an *explicit* cast on a PIV would clip it, then given `int32_t w = y*z, x=(int32_t)w;` variable `w` might or might not be in the range of an `int32_t` but `x` would hold the correct value mod 2^32. – supercat Apr 28 '15 at 16:27
  • @R: If there could be some new integer types added to the language with different semantics, I'd add `wrap_t(N)` which would be guaranteed to wrap mod 2^N, available for any N [compiler adding whatever masking code was required], and `snum32_t`, `unum32_t`, `tsnum32_t`, and `tunum32_t` [likewise for other common sizes] for numbers that were not supposed to exceed their ranges; the `st` and `ut` variants would set a flag on overflow or *out-of-range cast*. I think such things could offer better optimizations than existing types while better expressing programmer intent. – supercat Apr 28 '15 at 16:34

2 Answers2

10

The com.lang.c FAQ answers this in question 5.16, where it explains why this happens, and question 5.17, where it gives examples of actual machines with nonzero NULL. A relatively "common" case is for address 0 to be valid, so a different invalid address is selected for NULL. A more esoteric example is the Symbolics Lisp Machine, which does not even have pointers as we know them.

So it's not really a question of choosing the right compiler. On a modern byte-addressable system, with or without virtual memory, you are unlikely to encounter a NULL pointer that is not address 0.

The C standard is carefully designed to accommodate hardware that is downright bizarre, and this is just one the results. Another weird gotcha along the same lines is that it's possible for sizeof(void *) != sizeof(int *), but you'll never see it happen on a byte-addressable architecture.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
4

Yes, there are some implementations (some of them are exotics), where NULL pointer's internal representation is not be all-bits-zero. This section of the C FAQ is very interesting (especially these ones : 1 and 2).

md5
  • 23,373
  • 3
  • 44
  • 93