94

Is it undefined behavior to print null pointers with the %p conversion specifier?

#include <stdio.h>

int main(void) {
    void *p = NULL;

    printf("%p", p);

    return 0;
}

The question applies to the C standard, and not to C implementations.

Peter Varo
  • 11,726
  • 7
  • 55
  • 77
Dror K.
  • 1,989
  • 17
  • 26
  • A don't actually think that anyone (including the C committee) cares too much about it. It is a quite artificial problem, with no (or almost no) practical significance. – 0___________ Jul 09 '17 at 12:59
  • it is as printf only displays the value, and does not touch (in the meaning of reading or writing the pointed object) - cannot be UB i pointer has a valid for its type value (NULL is the **valid** value) – 0___________ Jul 09 '17 at 13:07
  • 3
    @PeterJ let's say what you are saying is true (though clearly the standard states otherwise), the fact alone, that we are debating on this makes the question a valid and correct one, as it looks like the below quoted part of the standard makes it very hard to understand for a regular developer what the heck is going on.. Meaning: the question does not deserve the down vote, because this problem requires clarification! – Peter Varo Jul 09 '17 at 13:13
  • 1
    Related: https://stackoverflow.com/q/10461360/694576 – alk Jul 09 '17 at 13:22
  • No - read the standard %p is not an pointer operation, and was added only for the reason that the pointer size can vary from the integer size. It is the exception from the all pointer rules. – 0___________ Jul 09 '17 at 13:22
  • 2
    @PeterJ that's a different story then, thanks for the clarification :) – Peter Varo Jul 09 '17 at 13:31
  • GCC 5.4 on Ubuntu 16.04 show : ( nil ) that refers to first Memory location. I print p+1 show: 0x1 – EsmaeelE Jul 11 '17 at 20:07
  • @DrorK. as per the documentaion (1) The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner. If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined – Ramankingdom Jul 18 '17 at 11:03
  • see below link's answer to this question - https://stackoverflow.com/questions/10461360/what-is-the-behavior-of-the-conversion-specifier-p-with-null-pointer – Rameshwar Vyevhare Jul 18 '17 at 12:21

3 Answers3

93

This is one of those weird corner cases where we're subject to the limitations of the English language and inconsistent structure in the standard. So at best, I can make a compelling counter-argument, as it's impossible to prove it :)1


The code in the question exhibits well-defined behaviour.

As [7.1.4] is the basis of the question, let's start there:

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, [... other examples ...]) [...] the behavior is undefined. [... other statements ...]

This is clumsy language. One interpretation is that the items in the list are UB for all library functions, unless overridden by the individual descriptions. But the list starts with "such as", indicating that it's illustrative, not exhaustive. For example, it does not mention correct null-termination of strings (critical for the behaviour of e.g. strcpy).

Thus it's clear the intent/scope of 7.1.4 is simply that an "invalid value" leads to UB (unless stated otherwise). We have to look to each function's description to determine what counts as an "invalid value".

Example 1 - strcpy

[7.21.2.3] says only this:

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

It makes no explicit mention of null pointers, yet it makes no mention of null terminators either. Instead, one infers from "string pointed to by s2" that the only valid values are strings (i.e. pointers to null-terminated character arrays).

Indeed, this pattern can be seen throughout the individual descriptions. Some other examples:

  • [7.6.4.1 (fenv)] store the current floating-point environment in the object pointed to by envp

  • [7.12.6.4 (frexp)] store the integer in the int object pointed to by exp

  • [7.19.5.1 (fclose)] the stream pointed to by stream

Example 2 - printf

[7.19.6.1] says this about %p:

p - The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printing characters, in an implementation-defined manner.

Null is a valid pointer value, and this section makes no explicit mention that null is a special case, nor that the pointer has to point at an object. Thus it is defined behaviour.


1. Unless a standards author comes forward, or unless we can find something similar to a rationale document that clarifies things.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/148735/discussion-on-answer-by-oliver-charlesworth-printing-null-pointers-with-p-is-un). – Bhargav Rao Jul 09 '17 at 17:25
  • 1
    "yet it makes no mention of null terminators" is weak in Example 1 - strcpy as the spec says "copies the _string_". _string_ is explicitly defined as having a _null character_. – chux - Reinstate Monica Jul 09 '17 at 22:54
  • 1
    @chux - That's somewhat my point - one has to *infer* what's valid/invalid from context, rather than assume that the list in 7.1.4 is exhaustive. (However, the existence of this part of my answer made somewhat more sense in the context of comments that have since been deleted, arguing that strcpy was a counterexample.) – Oliver Charlesworth Jul 09 '17 at 22:56
  • @OliverCharlesworth You said: "One interpretation is that a null pointer is always UB" ... but the very same paragraph `§7.1.4 p1` says: "Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow" – Dror K. Jul 10 '17 at 07:23
  • 1
    The crux of the issue is how the reader will interpret _such as_. Does it mean _some examples of **possible** invalid values are_? Does it mean _some examples which are **always** invalid values are_? For the record, I go with the first interpretation. – ninjalj Jul 10 '17 at 09:32
  • @DrorK. - Yes, I was wondering about that myself. Rather than tie myself in knots trying to untangle that, I've just cut out the `free`/`malloc` stuff from my answer for now. – Oliver Charlesworth Jul 10 '17 at 09:43
  • 1
    @ninjalj - Yes, agreed. That's essentially what I'm trying to convey in my answer here, i.e. "these are examples of the types of thing that might be invalid values". :) – Oliver Charlesworth Jul 10 '17 at 09:44
  • @ninjalj If we follow the first interpretation you described, then the puzzle piece in `§7.24.1 p2` no longer fits: `pointer arguments on such a call shall still have valid values, as described in 7.1.4`. – Dror K. Jul 10 '17 at 12:09
  • @OliverCharlesworth [DR #217](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_217.htm): In C99 the asctime function didn't specify the expected ranges, the committee quoted `§7.1.4 p1` as a response and didn't issue a TC for C99. So the interpretation that `§7.1.4 p1` doesn't specify what is considered to be invalid, doesn't agree with their response, and it doesn't agree with the statements in: `§7.24.1 p2`, `§7.22.5 p1`, `§7.29.4 p2`. – Dror K. Jul 10 '17 at 12:40
  • @OliverCharlesworth: To be fair: the comments have not been deleted, but moved to chat. I'm fine with some of them being moved, but otghers very well were relevant for the answer. But it might be too much effort to tell them apart. Nevertheless the chat is linked above. – too honest for this site Jul 10 '17 at 14:03
21

The Short Answer

Yes. Printing null pointers with the %p conversion specifier has undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.

The answer applies to any of the C standards (C89/C99/C11).


The Long Answer

The %p conversion specifier expects an argument of type pointer to void, the conversion of the pointer to printable characters is implementation-defined. It doesn't state that a null pointer is expected.

The introduction to the standard library functions states that null pointers as arguments to (standard library) functions are considered to be invalid values, unless it is explicitly stated otherwise.

C99 / C11 §7.1.4 p1

[...] If an argument to a function has an invalid value (such as [...] a null pointer, [...] the behavior is undefined.

Examples for (standard library) functions that expect null pointers as valid arguments:

  • fflush() uses a null pointer for flushing "all streams" (that apply).
  • freopen() uses a null pointer for indicating the file "currently associated" with the stream.
  • snprintf() allows to pass a null pointer when 'n' is zero.
  • realloc() uses a null pointer for allocating a new object.
  • free() allows to pass a null pointer.
  • strtok() uses a null pointer for subsequent calls.

If we take the case for snprintf(), it makes sense to allow passing a null pointer when 'n' is zero, but this is not the case for other (standard library) functions that allow a similar zero 'n'. For example: memcpy(), memmove(), strncpy(), memset(), memcmp().

It is not only specified in the introduction to the standard library, but also once again in the introduction to these functions:

C99 §7.21.1 p2 / C11 §7.24.1 p2

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values as described in 7.1.4.


Is it intentional?

I don't know whether the UB of %p with a null pointer is in fact intentional, but since the standard explicitly states that null pointers are considered invalid values as arguments to standard library functions, and then it goes and explicitly specifies the cases where a null pointer is a valid argument (snprintf, free, etc), and then it goes and once again repeats the requirement for the arguments to be valid even in zero 'n' cases (memcpy, memmove, memset), then I think it's reasonable to assume that the C standards committee isn't too concerned with having such things undefined.

Peter Varo
  • 11,726
  • 7
  • 55
  • 77
Dror K.
  • 1,989
  • 17
  • 26
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/148734/discussion-on-answer-by-dror-k-printing-null-pointers-with-p-is-undefined-beha). – Bhargav Rao Jul 09 '17 at 17:24
  • Your second quote you mention is quite plainly inapplicable, since it deals with passing arrays in conjunction with `size_t` arguments (which is what "such a call" refers to). Our null pointer print attempt is not such a call and it is not valid to generalize the rules for these particular calls to any function taking a pointer. – Jeroen Mostert Jul 09 '17 at 18:07
  • @JeroenMostert Should I assume that by not saying anything about the first quote, that you agree with what it says? BTW, you're welcome to join the chat ^ – Dror K. Jul 09 '17 at 18:11
  • @DrorK.: no, you should not. In language lawyer arguments, it's always best to first focus on the obvious and strip away what is plainly incorrect/irrelevant, before you get to the heart of the matter. I see how one could reasonably argue about 7.1.4, which I will not do at present. – Jeroen Mostert Jul 09 '17 at 18:15
  • 1
    @JeroenMostert: What is the intent of this argument? The given quote of 7.1.4 is rather clear, is it not? What is there to argue about _"unless explicitly stated otherwise"_ when it is _not being_ stated otherwise? What is there to argue about the fact that the (unrelated) string function library has a similar wording, so the wording does not seem to be accidential? I think this answer (while not really useful _in practice_) is as correct as can be. – Damon Jul 10 '17 at 11:26
  • @DrorK: Regarding your "I don't know if it's intentional" paragraph: It almost certainly is intentional, for compatibility reasons. I have admittedly never seen such hardware, but _folk_ says that in the land of myth and legends, there exist or _used to exist_ CPUs where you cannot store (store, not dereference!) an invalid pointer in an address register, otherwise you will trigger a hardware exception. Thus, you could conceivably call `printf` with an invalid pointer _on the stack_, and then... bang. – Damon Jul 10 '17 at 11:55
  • 3
    @Damon: Your mythical hardware isn't mythical, there are plenty of architectures where values that do not represent valid addresses may not be loaded in address registers. Passing null pointers as function arguments is still required to work on those platforms as a general mechanism, however. Merely putting one on the stack will not blow things up. – Jeroen Mostert Jul 10 '17 at 12:55
  • @JeroenMostert That's interesting! I only know x86, where address registers can contain any value. Do you have an example hardware that would complain about bad value in an address register? – anatolyg Jul 11 '17 at 13:35
  • @anatolyg It might be more suited to ask such questions in the chat room – Dror K. Jul 11 '17 at 13:40
  • 1
    @anatolyg: On x86 processors, addresses have two parts--a segment and an offset. On the 8086, loading a segment register is like loading any other, but on all later machines it fetches a segment descriptor. Loading an invalid descriptor causes a trap. A lot of code for 80386 and later processors, however, only uses one segment, and thus never loads segment registers *at all*. – supercat Jul 11 '17 at 16:47
  • downvoted - self-answered SO questions is not the place to attempt to highlight what is, at best, a minor defect in wording in the standard. Instead file a defect report if you really think this is a problem. – M.M Jul 13 '17 at 20:50
  • @M.M Thanks for commenting, I assume that it's considered 'polite' to state the reason for down-voting an answer, but since your reasoning has nothing to do with the answer, it seems to me to be misplaced. Aren't there more suited places on SO for such comments? If not, then once again- thanks for letting me know what's your personal opinion about what should or shouldn't be on SO. Also, thanks for your suggestion to file a defect report, but if anything- my answer makes the case for it *not* being a defect. – Dror K. Jul 14 '17 at 02:54
  • 1
    I think everyone would agree that printing a null pointer with `%p` is not supposed to be undefined behaviour – M.M Jul 14 '17 at 03:09
  • @M.M "self-answered SO questions is not the place to attempt to highlight what is, at best, a minor defect", please prove it. – Stargateur Jul 14 '17 at 09:45
  • I agree with this answer. The only thing I don't understand is how can one accept an answer that is opposite to their own? – rustyx Aug 30 '18 at 16:55
  • @M.M: The behavior of `%p` is Implementation-Defined for non-null values, and an implementation could satisfy the documentation requirement even if it specified a manner of conversion that would yield unpredictable behavior for some or all such values. Any implementation that would be conforming if using `%p` with a null pointer was UB could be made conforming even it such action was IDB, merely by having its documentation specifying a manner of conversion that would result in unpredictable behavior when given a null pointer. Categorizing the behavior as IDB wouldn't really change anything. – supercat Oct 16 '18 at 20:01
-1

The authors of the C Standard made no effort to exhaustively list all of the behavioral requirements an implementation must meet to be suitable for any particular purpose. Instead, they expected that people writing compilers would exercise a certain amount of common sense whether the Standard requires it or not.

The question of whether something invokes UB is seldom in and of itself useful. The real questions of importance are:

  1. Should someone who is trying to write a quality compiler make it behave in predictable fashion? For the described scenario the answer is clearly yes.

  2. Should programmers be entitled to expect that quality compilers for anything resembling normal platforms will behave in predictable fashion? In the described scenario, I would say the answer is yes.

  3. Might some obtuse compiler writers stretch the interpretation of the Standard so as to justify doing something weird? I would hope not, but wouldn't rule it out.

  4. Should sanitizing compilers squawk about the behavior? That would depend upon the paranoia level of their users; a sanitizing compiler probably shouldn't default to squawking about such behavior, but perhaps provide a configuration option to do in case programs might be ported to "clever"/dumb compilers that behave weirdly.

If a reasonable interpretation of the Standard would imply a behavior is defined, but some compiler writers stretch the interpretation to justify doing otherwise, does it really matter what the Standard says?

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1. It's not uncommon for programmers to find the assumptions made by modern/aggressive optimizers to be at odds with what they consider to be "reasonable" or "quality". 2. When it comes to ambiguities in the specification, it's not uncommon for implementors to be at odds as to what liberties they may assume. 3. When it comes to members of the C standards committee, even they don't always agree as to what the 'correct' interpretation is, let alone what it *should* be. Given the aforementioned, whose reasonable interpretation should we follow? – Dror K. Jul 10 '17 at 19:51
  • 7
    Answering the question "does this particular piece of code invoke UB or not" with a dissertation on what you think about the usefulness of UB or how compilers ought to behave is a poor attempt at an answer, especially since you can copy-paste this as an answer to almost *any* question about particular UB. As a rejoinder to your rhetorical flourish: yes, it really matters what the Standard says, no matter what some compiler writers do or what you think of them for doing that, because the Standard is what both programmers and compiler writers start from. – Jeroen Mostert Jul 10 '17 at 19:51
  • 1
    @JeroenMostert: The answer to "Does X invoke Undefined behavior" will often depend upon what one means by the question. If a program is regarded as having Undefined Behavior if the Standard would impose no requirements on the behavior of a conforming implementation, then nearly all programs invoke UB. The authors of the Standard clearly allow implementations to behave in arbitrary fashion if a program nests function calls too deeply, so long as an implementation can correctly process at least one (possibly contrived) source text that exercises the translation limits in the Stadard. – supercat Jul 10 '17 at 20:47
  • @supercat: very interesting, but is `printf("%p", (void*) 0)` undefined behavior or not, according to the Standard? Deeply nested function calls are as relevant to this as the price of tea in China. And yes, UB is very common in real-world programs -- what of it? – Jeroen Mostert Jul 10 '17 at 20:55
  • 1
    @JeroenMostert: Since the Standard would allow an obtuse implementation to regard almost any program as having UB, what should matter will be the behavior of non-obtuse implementations. In case you didn't notice, I didn't just write a copy/paste about UB, but answered the question about `%p` for each possible meaning of the question. – supercat Jul 10 '17 at 20:56
  • @supercat: well, that's where we soundly disagree, as it seems to me you're missing the most obvious, common meaning of the question, which is definitely addressed by the other two answers. But I suspect this is a difference of opinion where further discussion would not improve upon matters in any way, so I'll leave it at that. – Jeroen Mostert Jul 10 '17 at 21:01