29

I know C compilers aren't required to use all zeros for the bit representation of NULL, but they *are* required by the standard to make NULL evaluate to false in boolean contexts/comparisons. Hence the 2nd printf in the program below will always output false.

But what I want to know is: on systems where NULL is *not* all zeros, will a pointer value that *is* all zeros also evaluate to false in boolean contexts/comparisons? In other words, will the 1st printf in the program below ever output true?

Or asked in a slightly different way: can I rely on calloc to produce a pointer value that will always evaluate to false in boolean contexts/comparisons? The 1st answer to this question uses memset to clear the bits of a long* named y, then goes on to say that y==0 is UB because y may be a "trap representation" (whatever that is). calloc is also just clearing bits, so maybe o->p in the 1st printf is also UB?


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

typedef struct { void * p; } obj;

int main() {
    obj * o = calloc(sizeof(obj), 1);
    assert(o);  // assume successful allocation
    printf("%s\n", o->p ? "true" : "false");  // 1st: could print "true"?  Is o->p UB?
    o->p = NULL;
    printf("%s\n", o->p ? "true" : "false");  // 2nd: always prints "false"
    return 0;
}
textral
  • 1,029
  • 8
  • 13
  • 2
    Are you aware of an architecture where a null pointer isn't all zero bits? I don't think I've ever seen one, and that includes weirdos like segmented 16-bit x86. – Mark Ransom Aug 27 '20 at 18:27
  • ... and then consider `union { void *p; int i; }` – Hagen von Eitzen Aug 27 '20 at 19:41
  • 1
    Note that a pointer in C isn't necessarily really a number, it just behaves in some cases as one (you can do arithmetics, etc.). That's why C also doesn't define `%p` for printf, as there doesn't need to be a consistent representation of "pointer". FWIW, a pointer could be literally an arrow pointing to some "object", C is very abstract in that regard. Thus, asking about a "bit representation" usually isn't really meaningful for a strict reading of the standard. – ljrk Aug 27 '20 at 21:03
  • @HagenvonEitzen: interesting thought experiment... what would a C impl that uses non-zero bit null pointers do when a user reads `p` out of that union if it had previously written 0 into `i`? Would it honor the bit rep and return a zero-bits *non-null* pointer value, or will it honor the intent of the programmer and return a non-zero bits null pointer? I'm not sure it could safely infer the programmer's intent here, so I'd guess it honors the bit rep? I don't really know... good question! – textral Aug 28 '20 at 10:19
  • 1
    @textral How do you conclude the intent of the programmer from here? I can only tell that the programmer's intent was to write an all-bits-zero value there. If the intent was to have a null pointer or not, I cannot know. In doubt, I hhave to assume the programmer knows his target platform and knows what an all-zero pattern is and what not. – glglgl Aug 28 '20 at 10:25
  • 1
    @MarkRansom I worked on a C compiler for the CDC Cyber 180 line. These machines were intended to run a Multics-like operating system and so its 48-bit pointers included a 4 bit ring number. Only code running in ring 0 could create pointers that had the ring number set to 0. Hence we had null pointers which were not all 0's. If a pointer was used in a branch, we would move it to an integer register and mask out the ring number before testing it for all zeros; the second println statement would print "false". The first println accesses an uninitialized location and so is undefined behaviour. – Theodore Norvell Aug 28 '20 at 15:35
  • 1
    @TheodoreNorvell thanks for the specific example. I learned assembler on a CDC Cyber 6400, and of course it worked completely differently - the address registers were only 18 bit, and there was no ring concept. Each process had its own address space, and supervisory functions were handled by a separate set of peripheral processors. – Mark Ransom Aug 28 '20 at 16:02
  • @MarkRansom: Some hardware platforms will trap attempts to access certain addresses, but address zero can be accessed just like any other. On such platforms, there would be some advantages to having a null pointer be an address that would trap. There are also advantages, however, to representing a null pointer using all-bits-zero. Someone writing a compiler for such a system would be better placed than the Committee to judge which set of advantages outweighed the other. – supercat Aug 28 '20 at 21:19
  • @supercat I wasn't trying to argue for or against the concept of a processor that used a non-zero pointer for null, just was curious to know if there were any real-life examples. – Mark Ransom Aug 28 '20 at 23:07
  • @MarkRansom: There were certainly platforms where such a thing would have had advantages, and I know that historically some implementations have used pointers which were chosen to force hardware traps. What I don't know is whether any new implementations have actually done so since the 1980s. – supercat Aug 28 '20 at 23:16
  • 1
    @MarkRansom: Ironically, even as the C89 Committee was deciding to make accommodations for obscure C implementations, the C compiler marketplace was doing the exact opposite--recognizing that the compatibility advantages of having C implementations behave in a fashion analogous to the PDP-11 when practical outweighed most advantages that could be reaped by doing something else. – supercat Aug 28 '20 at 23:18
  • @MarkRansom The 180 was a weird mix of RISC (in the 6400 tradition) and CISC. It had 64-bit X registers and 48-bit A registers (no B registers). Unlike the 6400 memory was byte addressable, segmented, ringed, and virtual! – Theodore Norvell Aug 29 '20 at 21:47
  • Given "`NULL` which expands to an implementation-defined null pointer constant;" and "An integer constant expression with the value 0, or such an expression cast to type `void *`, is called a null pointer constant", then "When `NULL` is not all-zero-bits" is only true when `NULL` is a pointer or an _integer_ of -0 of some width, or a 0 with non-zero padding. 2nd & 3rd are at best rare, if not unheard of. – chux - Reinstate Monica Sep 12 '20 at 13:45

5 Answers5

12
typedef struct { void * p; } obj;
obj * o = calloc(sizeof(obj), 1);
assert(o);  // Let us set aside the case of a failed allocation
printf("%s\n", o->p ? "true" : "false");  // 1st: could print "true" ?

can I rely on calloc to produce a pointer value that will always evaluate to false in boolean contexts/comparisons?

No - output could be "true".*1.

The bit pattern of all zeros, as a pointer, may not be a null pointer.

7.22.3.2 The calloc function
2 The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero.301)
Footnote 301) Note that this need not be the same as the representation of floating-point zero or a null pointer constant.


Example: An implementation may only have only a single null pointer encoding with a bit pattern of all ones. (void *)0 converts the all zeros bit pattern int 0 to an all ones void *. if (null_pointer) is always false, regardless of the bit pattern of the null pointer.


*1 Yet practically yes, output is always "false". Implementations are uncommon these days that do not use all zero bit pattern as a null pointer. Highly portable code would not assume this practicality. Consider an old or new novel system may use a zero bit pattern as a non-null pointer - and sadly break many a code base that assumes an all zero bit pattern is a null pointer.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Thanks @chux, that justifies my uneasiness in using this code (though as you say, there are practically no implementatons out there that behave in such an unintuitive way). Is the other answer I linked to in my question right: is it considered UB to even reference such a value? – textral Aug 27 '20 at 07:42
  • And also, is the community answer that @Howlium linked to *wrong*? Or am I just misreading it? (I need to read it more closely... maybe they never refer to initialization at the bit level.) – textral Aug 27 '20 at 07:45
  • @textral It is always UB to de-reference a _null pointer_, regardless of its bit encoding.. A bit pattern of all zeros interpreted as an `int, long, long` etc. then cast to a `void *` is a _null pointer_. That does not mean the _null pointer_ is also a bit pattern of all zeros. – chux - Reinstate Monica Aug 27 '20 at 07:46
  • They weren't dereferencing a ptr, but simply comparing its value to 0. Which they claimed was UB. (See the 1st answer in this question: https://stackoverflow.com/questions/21386995/is-int-0-a-null-pointer) – textral Aug 27 '20 at 07:50
  • @textral Comparing a _null pointer_ `np` to `int 0` as in `np == 0` or the like `!np` is always true. The `int 0` is converted to a _null pointer_ (that pointer's bit pattern may not be all zeros) and then `np` is compared against a _null pointer_. All _null pointers_ compare as equal, even if the system has multiple bit patterns that constitute _null pointers_. Pointer compare need not be a simple bit pattern compare. – chux - Reinstate Monica Aug 27 '20 at 07:59
  • 2
    In the second code fragment, the 6.3.2.3 3 isn't relevant. The expression `o->i` isn't an *integer constant expression*. – M. Nejat Aydin Aug 27 '20 at 08:06
  • I think the claim *"Contrast this to the below which always prints "false""* is incorrect. Expression `(void*)o->i` should be interpreted by compiler as `o->i != 0x00000000` or `o->i != 0xDEADBEEF` depending on what kind of NULL pointer system uses. – user694733 Aug 27 '20 at 08:20
  • @M.NejatAydin Interesting observation. Second code fragment removed. – chux - Reinstate Monica Aug 27 '20 at 08:41
  • 1
    @chux-ReinstateMonica: An implementation may specify the behavior of dereferencing a null pointer--sometimes usefully. On typical compilers for certain Motorola/Freescale microcontrollers where the control register for port A is located at address zero, attempting to store 0x42 to *(unsigned char*)0` would set the PORT A control register to 0x42, and that would be the normal way of setting that register's value (perhaps `*((unsigned short*)0xFFFF) = 0x42;` would work if hardware ignored writes to 0xFFFF, but that seems jankier than writing to a literal address zeor. – supercat Aug 27 '20 at 22:39
  • @supercat Interesting. IMO this is one of the strengths of C: it allows novel ideas to be tried and still compliant with C. Of course the market does reject some of those ideas. – chux - Reinstate Monica Aug 27 '20 at 22:42
  • @chux-ReinstateMonica: What the Standard requires of null pointers is that an implementation refrain from assigning any object an address that compares equal to them. If a hardware or OS designer happens to put something useful there, an implementation may opt to allow or disallow access, just as would be the case for any other location the hardware or OS might put things. – supercat Aug 27 '20 at 23:07
  • 1
    Sorry for the long pause @chux. Going back to our conversation (in the comments): so if the all-zeros bit pattern written by memset happened to be a *null pointer* value according to the C implementation, then referencing `y` and comparing it to 0 is well defined. But the code may find it's way onto a system where the C impl only uses non-zero bit *null pointers*, in which case `y==0` is comparing a *non-null ptr* to zero, so 0 is not converted to a pointer type, and hence the comparison is UB. Do I understand correctly? – textral Aug 28 '20 at 10:32
  • @textral Yes, with the proviso that "may find it's way onto a system where the C impl only uses non-zero bit null pointers" is uncommon. – chux - Reinstate Monica Aug 28 '20 at 14:09
  • @textral: No version of the C Standard has made much effort to distinguish between situations where an action which would sometimes defined and sometimes not, based upon information that would be available to the programmer (like the documented formats for various data types), versus actions which should be regarded as Undefined Behavior even on systems or in cases where everything about the behavior would otherwise be specified. For example, on a sign-magnitude system it might make sense for `x<<1` to do something weird if `x` is negative, but on a two's-complement system... – supercat Aug 28 '20 at 16:18
  • ...without padding bits, it should simply be equivalent to `x+x`. Interestingly, C89 defined the behavior in both cases (even though the behavior may have bee inappropriate for sign-magnitude systems), while C99 reclassified it as UB on all systems. – supercat Aug 28 '20 at 16:19
6

Background information

Consider the following places where the logical value of an expression is used, all taken from C18, my emphasis in bold italic:

  • 6.3.1.2 (Boolean type) p1: When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1.

  • 6.5.3.3 (Unary arithmetic operators) p5: The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E).

  • 6.5.13 (Logical AND operator) p3: The && operator shall yield 1 if both of its operands compare unequal to 0; otherwise, it yields 0. The result has type int.

  • 6.5.14 (Logical OR operator) p3: The || operator shall yield 1 if either of its operands compare unequal to 0; otherwise, it yields 0. The result has type int.

  • 6.5.15 (Condtional operator) p4: The first operand is evaluated; there is a sequence point between its evaluation and the evaluation of the second or third operand (whichever is evaluated). The second operand is evaluated only if the first compares unequal to 0; the third operand is evaluated only if the first compares equal to 0; the result is the value of the second or third operand (whichever is evaluated), converted to the type described below.

  • 6.8.4.1 (The if statement) p2: In both forms, the first substatement is executed if the expression compares unequal to 0. In the else form, the second substatement is executed if the expression compares equal to 0. If the first substatement is reached via a label, the second substatement is not executed.

  • 6.8.5 (Iteration statements) p4: An iteration statement causes a statement called the loop body to be executed repeatedly until the controlling expression compares equal to 0. The repetition occurs regardless of whether the loop body is entered from the iteration statement or by a jump.

"E compares equal to 0" is equivalent to the C expression (E == 0), and "E compares unequal to 0" is equivalent to the C expression (E != 0). The constraints of the equality operators are given by:

  • 6.5.9 (Equality operators) p2: One of the following shall hold:
    • both operands have arithmetic type;
    • both operands are pointers to qualified or unqualified versions of compatible types;
    • one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void; or
    • one operand is a pointer and the other is a null pointer constant.

Regarding the semantics of the equality operators where at least one operand is a pointer:

  • 6.5.9 (Equality operators) p5: Otherwise, at least one operand is a pointer. If one operand is a pointer and the other is a null pointer constant, the null pointer constant is converted to the type of the pointer. If one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void, the former is converted to the type of the latter.

  • p6: Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Regarding null pointer constants:

  • 6.3.2.3 (Pointers) p3: An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant67). If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

OP's questions

But what I want to know is: on systems where NULL is not all zeros, will a pointer value that is all zeros also evaluate to false in boolean contexts/comparisons?

Aside: NULL is a null pointer constant, not necessarily a null pointer (see 6.3.2.3p3 above where it could be an integer constant expression). What you really mean is a system where the bit representation of a null pointer is not all zeros.

Note: As pointed out by Eric Postpischil in the comments below, a system could have several bit representations of null pointer values, so we assume that none of them are all-zero bit representations for this question.

In order for the pointer value to evaluate to false in boolean contexts/comparisons, it must compare unequal to 0. In this context, it must compare unequal to a null pointer constant. By 6.5.9p5 above, the null pointer constant will be converted to the type of the pointer it is being compared to. By 6.5.9p6 above, a null pointer value will not compare equal to a non-null pointer value. So a non-null pointer value with all bits zero on a system where a null pointer value is not all bits zero will evaluate to true in a boolean context.

Or asked in a slightly different way: can I rely on calloc to produce a pointer value that will always evaluate to false in boolean contexts/comparisons?

No, you cannot rely on calloc (or memset with byte value 0) to produce a pointer value that will evaluate to false in boolean contexts. If a pointer value with an all-zero bit representation is not a null pointer value it will evaluate to true in boolean contexts.

Ian Abbott
  • 15,083
  • 19
  • 33
  • “If a null pointer has a not all-zero bit representation, a pointer value with an all-zero bit representation will evaluate to true in boolean contexts” is not a valid inference. An implementation may have multiple bit patterns that constitute null pointers, including one that is all zeros and some that are not. This answer (and others here) fail to to be clear in this regard. – Eric Postpischil Aug 27 '20 at 10:51
  • @EricPostpischil OK I should change that to say a non-null pointer value with an all-zero bit representation will evaluate to true in boolean contexts. – Ian Abbott Aug 27 '20 at 10:54
3

There's a great discussion of NULL and 0 in the first answer to this question: What is the difference between NULL, '\0' and 0?

The punchline in that answer is:

Note that what is a null pointer in the C language. It does not matter on the underlying architecture. If the underlying architecture has a null pointer value defined as address 0xDEADBEEF, then it is up to the compiler to sort this mess out.

…Even on this funny architecture, the following ways are still valid ways to check for a null pointer:

if (!pointer)
if (pointer == NULL)
if (pointer == 0)

And in the second answer to the same question…

A constant expression of type int with the value 0, or an expression of this type, cast to type void * is a null pointer constant, which if converted to a pointer becomes a null pointer. It is guaranteed by the standard to compare unequal to any pointer to any object or function.

(Short answer, yes, you can check for a NULL pointer with if (!ptr)).

Howlium
  • 1,218
  • 1
  • 8
  • 19
  • 4
    But this doesn't answer to the crucial question: *will the 1st printf in the program below ever output true?* – M. Nejat Aydin Aug 27 '20 at 06:52
  • Wait a minute @Howlium, I found another answer that now confuses me (https://stackoverflow.com/questions/21386995/is-int-0-a-null-pointer). The 1st answer uses memset to set the bits of a long* (y) to 0, then goes on to say that y == 0 is UB because y may be a trap representation. calloc clears bits just like memset does in that answer, so why is it any different? – textral Aug 27 '20 at 07:02
  • @M.NejatAydin I gess not. The compiler knows the data types of the struct and will act accordingly. – stderr Aug 27 '20 at 07:05
  • 3
    @stderr I'm not convinced. The assignment `obj *o = calloc(sizeof(obj), 1)` sets all bits of the pointer `o->p` to zero. But that object representation (all bits zero) doesn't have to represent a null pointer. – M. Nejat Aydin Aug 27 '20 at 07:31
  • @M.NejatAydin No, you're right. See the answer chux provided. C89 says the same. Footnote 127: Note that this need not be the same as the representation of floating-foint zero or a null pointer. – stderr Aug 27 '20 at 07:47
2

Core Answer

But what I want to know is: on systems where NULL is *not* all zeros, will a pointer value that *is* all zeros also evaluate to false in boolean contexts/comparisons?

In a C implementation, the C standard allows any of:

  • All-bits-zero is a null pointer and no other bit pattern is.
  • All-bits-zero is a null pointer and one or more other bit patterns are.
  • All-bits-zero is not a null pointer and one or more other bit patterns are.

In other words, a C implementation may designate any one or more bit patterns to be null pointers, and this may or may not include all-bits-zero. (If the C implementation does allow multiple bit patterns to be null pointers, it must ensure they compare equal.)

… will the 1st printf in the program below ever output true?

It is allowed that it print “true”; the result of calloc is memory with all bits zero, and interpreting that memory as a void * may result in a pointer value that is not a null pointer value.

Supplement

… where NULL is *not* all zeros…

NULL is only something in source code. It is either 0 or ((void *) 0) or an equivalent. Wherever it is used as a pointer in source code (that is, you are doing normal things like if (pointer != NULL), not kludges like int x = 3 + NULL;), the compiler effectively converts it to a null pointer. That is, if all-bits-zero is not a null pointer in the C implementation, the compiler will compile pointer != NULL to a comparison of pointer to some bit pattern that does represent a null pointer.

So your questions are all about null pointers; they are not about NULL.

… on systems where…

The final determination of what is a null pointer lies with the C implementation, not the system it executes on. A C implementation may represent pointers in any way it wants and transform them as necessary when using machine addresses in instructions.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

You can avoid such questions with an explicit and defensive coding style.

if you have an pointer _p, write constructs like

    (_p==NULL)?(A):(B)

now any reader knows immediately, your intend is to check if _p is equal to NULL and even on a machine where NULL might be different from an integer value of 0, the compiler will automatical do it correctly. Also an static code checker now will not warn you because of relying on implicit behaviour.

    (_p)?(A):(B)

is just not doing it right

but apart from that, its an interesting technical question.

A interesting Talk from the C++ committee from 2019 or 2020 revealed that even these guys think about dropping compatibility to some odd undefined behaviour, which was needed back before 1970 for some 3-4 architectures. There wasn't any known use of this stuff in the last decades - at least to my knowlege. As the first comment to your questions states: You will hardly find any machine with such an issue - at least outside of an museum.

schnedan
  • 234
  • 1
  • 9
  • 2
    Whether there's anything wrong with the `(_p)?(A):(B)` option is strictly a question of style. If `_p`'s value is a null pointer then that expression will evaluate to the same thing as `(B)`, otherwise to the same thing as `(A)`, regardless of `_p`'s bit pattern. This is *exactly* equivalent to your other version. At best, saying that one "is just not doing it right" is subject to misinterpretation. – John Bollinger Sep 02 '20 at 18:02
  • Well I explicit said: style. Not everything a compiler accepts and not everything in a standard - at least for backward compatibility - is good code. and (_p)?()() is just not good code. And as this question showed its not safe from missinterpretation. – schnedan Sep 02 '20 at 18:52
  • 1
    @schnedan "(_p)?()() is just not good code" Well, that's subjective. "And as this question showed its not safe from missinterpretation" Depends on who interprets it. – glglgl Sep 03 '20 at 09:42
  • If you ever happen to debug a huge project, which other people have developed, you will train yourself to produce code which is clear, defensive, maintainable,... – schnedan Sep 03 '20 at 18:08
  • Any ref to the "Talk from the C++ committee from 2019 or 2020"? Would like to see what ideas are being discussed about that. – chux - Reinstate Monica Sep 12 '20 at 13:37
  • "about dropping compatibility to some odd undefined behaviour" which I mentioned was e.g. about defining signed integers to be two's complement. C++ derived by C that this is undefined, but no machine since 1970 was build with a different number system. And things like (_p)?(A):(B) is addressed by recommendations as changes would break old code and would never make it to the standard. But any professional not declaring that to bad code/style is a non professional in my perspective. And most new features to C++ make code more explicit and more error proof. – schnedan Sep 12 '20 at 13:47
  • @chux-ReinstateMonica - I forgot to answer your question: the Talks are all on youtube... you find all of them easily. I found of 5 are woth the time spent watching – schnedan Sep 14 '20 at 06:59