Can a conforming C implementation #define NULL to be something wacky

Question

I'm asking because of the discussion that's been provoked in this thread.

Trying to have a serious back-and-forth discussion using comments under other people's replies is not easy or fun. So I'd like to hear what our C experts think without being restricted to 500 characters at a time.

The C standard has precious few words to say about NULL and null pointer constants. There's only two relevant sections that I can find. First:

3.2.2.3 Pointers

An integral constant expression with the value 0, or such an expression cast to type void * , is called a null pointer constant. If a null pointer constant is assigned to or compared for equality to a pointer, the constant is converted to a pointer of that type. Such a pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

and second:

4.1.5 Common definitions

The macros are
NULL
which expands to an implementation-defined null pointer constant;

The question is, can NULL expand to an implementation-defined null pointer constant that is different from the ones enumerated in 3.2.2.3?

In particular, could it be defined as:

#define NULL __builtin_magic_null_pointer

Or even:

#define NULL ((void*)-1)

My reading of 3.2.2.3 is that it specifies that an integral constant expression of 0, and an integral constant expression of 0 cast to type void* must be among the forms of null pointer constant that the implementation recognizes, but that it isn't meant to be an exhaustive list. I believe that the implementation is free to recognize other source constructs as null pointer constants, so long as no other rules are broken.

So for example, it is provable that

#define NULL (-1)

is not a legal definition, because in

if (NULL) 
   do_stuff();

do_stuff() must not be called, whereas with

if (-1)
   do_stuff();

do_stuff() must be called; since they are equivalent, this cannot be a legal definition of NULL.

But the standard says that integer-to-pointer conversions (and vice-versa) are implementation-defined, therefore it could define the conversion of -1 to a pointer as a conversion that produces a null pointer. In which case

if ((void*)-1)

would evaluate to false, and all would be well.

So what do other people think?

I'd ask for everybody to especially keep in mind the "as-if" rule described in 2.1.2.3 Program execution. It's huge and somewhat roundabout, so I won't paste it here, but it essentially says that an implementation merely has to produce the same observable side-effects as are required of the abstract machine described by the standard. It says that any optimizations, transformations, or whatever else the compiler wants to do to your program are perfectly legal so long as the observable side-effects of the program aren't changed by them.

So if you are looking to prove that a particular definition of NULL cannot be legal, you'll need to come up with a program that can prove it. Either one like mine that blatantly breaks other clauses in the standard, or one that can legally detect whatever magic the compiler has to do to make the strange NULL definition work.

Steve Jessop found an example of way for a program to detect that NULL isn't defined to be one of the two forms of null pointer constants in 3.2.2.3, which is to stringize the constant:

#define stringize_helper(x) #x
#define stringize(x) stringize_helper(x)

Using this macro, one could

puts(stringize(NULL));

and "detect" that NULL does not expand to one of the forms in 3.2.2.3. Is that enough to render other definitions illegal? I just don't know.

Thanks!

For what it's worth, I've started a thread in the comp.std.c usenet group, asking this same question. Many of the experts there are actually members of the standards committee, and some of them probably know K and/or R personally. I'll let everyone here know if they come up with some dusty corner of standardese that clarifies this definitively. — janks, Apr 08 '10 at 12:28
Is there any requirement that the `#define` of `NULL` match one of the forms listed, or only that `NULL` must expand to something which would match one of those forms? The pair of definitions `#define __NULL 0` and `#define NULL __NULL` would yield the proper expansion after preprocessing, but I believe it would stringize as `__NULL`. — supercat, Mar 06 '12 at 00:52

score 14 · Answer 1 · edited Jan 20 '15 at 17:50

14

In the C99 standard, §7.17.3 states that NULL “expands to an implementation defined null pointer constant”. Meanwhile §6.3.2.3.3 defines null pointer constant as “an integer constant expression with the value 0, or such an expression cast to type void *”. As there is no other definition for a null pointer constant, a conforming definition of NULL must expand to an integer constant expression with the value zero (or this cast to void *).

Further quoting from the C FAQ question 5.5 (emphasis added):

Section 4.1.5 of the C Standard states that NULL “expands to an implementation-defined null pointer constant,” which means that the implementation gets to choose which form of 0 to use and whether to use a `void *` cast; see questions 5.6 and 5.7. “Implementation-defined” here does not mean that NULL might be #defined to match some implementation-specific nonzero internal null pointer value.

It makes perfect sense; since the standard requires a zero integer constant in pointer contexts to compile into a null pointer (regardless of whether or not the machine's internal representation of that has a value of zero), the case where NULL is defined as zero must be handled anyhow. The programmer is not required to type NULL to obtain null pointers; it's just a stylistic convention (and may help catch errors e.g. when a NULL defined as (void *)0 is used in a non-pointer context).

Edit: One source of confusion here seems to be the concise language used by the standard, i.e. it does not explicitly say that there is no other value that might be considered a null pointer constant. However, when the standard says “…is called a null pointer constant”, it means that exactly the given definitions are called null pointer constants. It does not need to explicitly follow every definition by stating what is non-conforming when (by definition) the standard defines what is conforming.

edited Jan 20 '15 at 17:50

Philip Couling

13,581
5
53
85

answered Apr 08 '10 at 12:29

Arkku

41,011
10
62
84

The C99 text you've quoted is the same as the C89 text, and the FAQ isn't normative. You might be onto something with the argumentation regarding the absence of other definitions. I'll have to look further into that. – janks Apr 08 '10 at 13:09
Edited the answer to address the absence of other definitions. One way to think about it would be to look at other parts of the standard; when there are implementation-defined possibilities involved, it's always explicitly stated. The language in the standard aims to be exact, there's no room for speculating about things left unsaid. – Arkku Apr 08 '10 at 13:30
One may also consider how the definition of *null pointer constant* would look if other possibilities were allowed. It would not say “X is called…” and then give no mention of other possibilities if there were any, because that would allow arbitrary things (like your neighbour's cat) to be called null pointer constants. If there were other options, it would define what exactly *can* be a null pointer constant (e.g. "any implementation-defined integer constant expression or such an expression cast to void”). – Arkku Apr 08 '10 at 13:39
But the standard has plenty of examples of completely restricted implemented-defined behaviour (whether char is signed or unsigned, two choices), as well as completely unrestricted implementation-defined behaviour (maximum number of case statements in a select, additional forms of `main()` and `main(int, char**)`, representation of floats, etc). Why is there a problem with unbounded lists of arbitrary things? Implementation-defined means the implementation must define them somewhere, so they'll be exhaustively documented by the implementation at the end of the day, no matter what it chooses – janks Apr 08 '10 at 14:37
That's my point; in each of these implementation-defined cases the standard specifies that they are up to the implementation. With null pointer constants, only the two possibilities are given. The definition of NULL says that it *is* a null pointer constant, but this time explicitly states that the implementor can decide which null pointer constant to use. The definition of null pointer constant does not leave any room to assume that there might be other possibilities. – Arkku Apr 08 '10 at 14:55

Keith Thompson · Answer 2 · 2020-06-23T21:04:58.477

This expands a bit on some of the other answers and makes some points that the others missed.

Citations are to N1570 a draft of the 2011 ISO C standard. I don't believe there have been any significant changes in this area since the 1989 ANSI C standard (which is equivalent to the 1990 ISO C standard). A reference like "7.19p3" refers to subsection 7.19, paragraph 3. (The citations in the question seem to be to the 1989 ANSI standard, which described the language in section 3 and the library in section 4. All editions of the ISO standard describe the language in section 6 and the library in section 7.)

7.19p3 requires the macro NULL to expand to "an implementation-defined null pointer constant".

6.3.2.3p3 says:

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.

Since null pointer constant is in italics, that's the definition of the term (3p1 specifies that convention) -- which implies that nothing other that what's specified there can be a null pointer constant. (The standard doesn't always strictly follow that convention for its definitions, but there's no problem assuming that it does so in this case.)

So if we want "something wacky", we need to look at what can be an "integer constant expression".

The phrase null pointer constant needs to be taken as a single term, not as a phrase whose meaning depends on its constituent words. In particular, the integer constant 0 is a null pointer constant, regardless of the context in which it appears; it needn't result in a null pointer value, and it's of type int, not of any pointer type.

An "integer constant expression with the value 0" can be any of a number of things (infinitely many if we ignore capacity limits). A literal 0 is the most obvious. Other possibilities are 0x0, 00000, 1-1, '\0', and '-'-'-'. (It's not 100% clear from the wording whether "the value 0" refers specifically to that value of type int, but I think the consensus is that 0L is also a valid null pointer constant.)

Another relevant clause is 6.6p10:

An implementation may accept other forms of constant expressions.

It's not entirely clear (to me) how much latitude this is intended to allow. For example, a compiler might support binary literals as an extension; then 0b0 would be a valid null pointer constant. It might also allow C++-style references to const objects, so that given

const int x = 0;

a reference to x could be a constant expression (it isn't in standard C).

So it's clear that 0 is a null pointer constant, and that it's a valid definition for the NULL macro.

It's equally clear that (void*)0 is a null pointer constant, but it's not a valid definition for NULL, because of 7.1.2p5:

Any definition of an object-like macro described in this clause shall expand to code that is fully protected by parentheses where necessary, so that it groups in an arbitrary expression as if it were a single identifier.

If NULL expanded to (void*)0, then the expression sizeof NULL would be a syntax error.

So what about ((void*)0)? Well, I'm 99.9% sure that it's intended to be a valid definition for NULL, but 6.5.1, which describes parenthesized expressions, says:

A parenthesized expression is a primary expression. Its type and value are identical to those of the unparenthesized expression. It is an lvalue, a function designator, or a void expression if the unparenthesized expression is, respectively, an lvalue, a function designator, or a void expression.

It doesn't say that a parenthesized null pointer constant is a null pointer constant. Still, as far as I know all C compilers reasonably assume that a parenthesized null pointer constant is a null pointer constant, making ((void*)0) a valid definition for NULL.

What if a null pointer is represented not as all-bits-zero, but as some other bit pattern, for example, one equivalent to 0xFFFFFFFF. Then (void*)0xFFFFFFFF, even if it happens to evaluate to a null pointer is not a null pointer constant, simply because it doesn't satisfy the definition of that term.

So what other variations are permitted by the standard?

Since implementations may accept other forms of constant expression, a compiler could define __null as a constant expression of type int with the value 0, allowing either __null or ((void*)__null) as the definition of NULL. It could make also __null itself a constant of pointer type, but it couldn't then use __null as the definition of NULL, since it doesn't satisfy the definition in 6.3.2.3p3.

An implementation could accomplish the same thing, with no compiler magic, like this:

enum { __null };
#define NULL __null

Here __null is an integer constant expression of type int with the value 0, so it can be used anywhere a constant 0 can be used.

The advantage defining NULL in terms of a symbol like __null is that the compiler could then issue a (perhaps optional) warning if NULL is used in a non-pointer constant. For example, this:

char c = NULL; /* PLEASE DON'T DO THIS */

is perfectly legal if NULL happens to be defined as 0; expanding NULL to some recognizable token like __null would make it easier for the compiler to detect this questionable construct.

score 1 · Answer 3 · answered Jan 30 '14 at 18:22

Ages later, but no one brought up this point: Suppose that the implementation does in fact choose to use

#define NULL __builtin_null

My reading of C99 is that that's fine as long as the special keyword __builtin_null behaves as-if it were either "an integral constant expression with value 0" or "an integral constant expression with value 0, cast to void *". In particular, if the implementation chooses the former of those options, then

int x = __builtin_null;
int y = __builtin_null + 1;

is a valid translation unit, setting x and y to the integer values 0 and 1 respectively. If it chooses the latter, of course, both are constraint violations (6.5.16.1, 6.5.6 respectively; void * is not a "pointer to an object type" per 6.2.5p19; 6.7.8p11 applies the constraints for assignment to initialization). And I don't offhand see why an implementation would do this if not to provide better diagnostics for "misuse" of NULL, so it seems likely that it would take the option that invalidated more code.

score 0 · Answer 4 · answered Apr 08 '10 at 11:11

0

Well, I've found a way to prove that

#define NULL ((void*)-1)

is not a legal definition of NULL.

int main(void) 
{ 
   void (*fp)() = NULL;   
}

Initializing a function pointer with NULL is legal and correct, whereas...

int main(void) 
{ 
   void (*fp)() = (void*)-1;   
}

...is a constraint violation that requires a diagnostic. So that's out.

But the __builtin_magic_null_pointer definition of NULL wouldn't suffer that problem. I'd still like to know if anybody can come up with a reason why it can't be.

answered Apr 08 '10 at 11:11

janks

2,120
16
13

1

Why is your second initialization a constraint violation that requires a diagnostic? *If* a conforming compiler is allowed to announce that `(void*)-1` is a null pointer constant (which I doubt), then my suspicion (without know what text you're looking at), is that it would be a legal initialization, because null pointer constants by definition convert to any pointer type, including pointer-to-function. – Steve Jessop Apr 08 '10 at 11:20
You could very well be right about that, heh. I had in mind the first clause of `3.2.2.3 Pointers`, which says `A pointer to void may be converted to or from a pointer to any incomplete or object type...`. A function is neither an incomplete nor an object type. But now I see the final clause of `3.3.16.1 Simple assignment ... Contraints ... One of the following shall hold ... the left operand is a pointer and the right is a null pointer constant`. If the implementation defined ((void*)-1) as a valid null pointer constant, then that would seem to permit it. Reading the standard isn't easy – janks Apr 08 '10 at 11:52

score 0 · Answer 5 · answered Apr 08 '10 at 11:27

0

An integral constant expression with the value 0, or such an expression cast to type void * , is called a null pointer constant.

NULL which expands to an implementation-defined null pointer constant;

therefore either

NULL == 0

or

NULL == (void *)0

answered Apr 08 '10 at 11:27

Erich Kitzmueller

36,381
5
80
102

But does the first sentence preclude other forms of the null pointer constant? Or is it the minimum set of null pointer constants that must be recognized by the implementation? – janks Apr 08 '10 at 11:55
1

@ammoQ: you haven't listed all possibilities. The following are also integral constant expressions with the value 0: `0x0`, `0L`, `(1-1)`, `(12^12)`, and depending on implementation possibly `(2*INT_MIN)`. – Steve Jessop Apr 08 '10 at 12:11
@janks: You are confusing two different things. In the *source code* the only valid value of `NULL` is zero (or zero cast to `void *`). However, this code may *compile* to a different representation, without any need for the machine to actually support zero as an equivalent. – Arkku Apr 08 '10 at 12:14
@Arkku: I'm well aware of the distinction between value and representation. Can you quote any standardese to support the claim that 0 or 0 cast to void* are the ONLY legal forms for the null pointer constant to take in the source code? `3.2.2.3` quoted above says nothing about "only", and doesn't otherwise imply to me that it is an exhaustive list. Is there another part of the standard that you know of that clarifies this? – janks Apr 08 '10 at 12:20
@janks: See my answer for references to the standard. – Arkku Apr 08 '10 at 12:39
1

Steve: that's why I used the == operator, instead of assuming a #define – Erich Kitzmueller Apr 08 '10 at 16:14
1

janks: IMO the first sentence precludes other forms of null pointer constants. – Erich Kitzmueller Apr 08 '10 at 16:19
1

`(void*)0` is not a valid definition of `NULL`, due to C99 7.1.2p5: "Any definition of an object-like macro described in this clause shall expand to code that is fully protected by parentheses where necessary, so that it groups in an arbitrary expression as if it were a single identifier." Without the parentheses, the valid expression `sizeof NULL` would be a syntax error. `define NULL ((void*)0)` is valid, though an overly pedantic reading of 6.5.1p5, which fails to say that a parenthesized null pointer constant is a null pointer constant, could imply that it isn't. This is unchanged in C11. – Keith Thompson Sep 26 '14 at 18:23

John Bode · Answer 6 · 2010-04-08T14:34:41.303

-2

The null pointer constant must evaluate 0, otherwise expressions like !ptr would not work as expected.

The NULL macro expands to a 0-valued expression; AFAIK, it always has.

edited Apr 08 '10 at 14:34

answered Apr 08 '10 at 14:10

John Bode

119,563
19
122
198

2

What would prevent the compiler from re-writing `!ptr` as `ptr == __magic_null` if that is what it used? All that is required of the compiler is that it make !ptr work somehow. It doesn't have to work as-if by treating ptr as an integer. – janks Apr 08 '10 at 14:31

Can a conforming C implementation #define NULL to be something wacky

3.2.2.3 Pointers

4.1.5 Common definitions

6 Answers6

Linked

Related