6

I'm creating a modified printf implementation, and I'm not sure about the answers to these questions.

  1. Does zero work as a null string? (Is printf("%s", 0) allowed?)

    I'm guessing no, because 0 is an int. But then this prompts this question:

  2. Does NULL work as a null string? (Is printf("%s", NULL) allowed?)

    Logically, I think it should be yes, because NULL implies a pointer; but a lot of implementations seem to have #define NULL 0, so I feel in practice it might be no. Which is correct?

  3. Does the pointer type have to point to char? (Is printf("%s", (void const *)"") allowed?)

    My guess is that the type doesn't matter, but I'm not sure.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    Note that if you're creating your own implementation, you might want to attempt to support these usages even though they're all UB (see my answer as to why they're UB). There's a decent amount of broken software out there that assumes they work... :-( – R.. GitHub STOP HELPING ICE Aug 31 '12 at 21:20
  • 1
    I am so glad that C++11 added `nullptr` – Blastfurnace Aug 31 '12 at 21:20
  • Posix [exec](http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html) is an example of a varargs function that *must* be called with an argument `(char *) NULL`. – ecatmur Aug 31 '12 at 21:28
  • 1
    Related to R..'s note for those creating implementations: even if you define `NULL` to be an integer, it is *strongly advisable* to make it an integer that's the same size as a pointer, unless you're deliberately creating a debugging implementation to catch obscure UB. And also to make the representation of a null pointer all zeros. Users *will* erroneously pass `NULL` as a vararg, thinking it's a pointer. In C `NULL` in fact does *not* imply a pointer, despite that being the sole reason people use it. Bjarne Stroustrup's remarks on `NULL` persuaded me that it's broken, I use `0` instead. – Steve Jessop Sep 01 '12 at 01:02
  • NULL question subset only: http://stackoverflow.com/questions/11589342/what-is-the-behavior-of-printing-null-with-printfs-s-specifier?lq=1 – Ciro Santilli OurBigBook.com Jul 20 '15 at 09:21

3 Answers3

8

Case 1 is undefined behavior because the type of the argument (int) does not match the type required by the format specifier (char *).

Case 2 is undefined behavior for the same reason. NULL is allowed to be defined as any integer constant expression with value 0, or such an expression cast to (void *). None of these types are char *, so the behavior is undefined.

Case 3 is undefined behavior for the same reason. "" yields a valid pointer to a null-terminated character array (string), but when you cast it to const void *, it no longer has the right type to match the format string. Thus the behavior is undefined.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Looking at 7.21.6.1:8 (*an array of character type*), I think `unsigned char *` and `signed char *` would also be OK. – ecatmur Aug 31 '12 at 21:27
  • 3
    According to section 6.2.5.27, "A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.39) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements." It's always to convert `void*` to `char*`, so #3 is valid. #2 is still invalid, but for a different reason. – Sergey Kalinichenko Aug 31 '12 at 21:27
  • as @dasblinkenlight wrote, case 3 is strictly conforming. – ouah Aug 31 '12 at 21:32
  • @dasblinkenlight: The cited text has no bearing on `printf`, which is not specified in terms of representation. The very same came up here before: http://stackoverflow.com/questions/4664100/does-printfx-1-invoke-undefined-behavior; the C standard seems to mandate that `printf("%x",1);` is undefined behavior, because the rules for `va_arg` that make an `int` acceptable in place of `unsigned int` when the value would fit in either are never applied to `printf`. Thankfully this is probably just a bug in the standard, so you could say that the `(const void *)""` issue is also a bug... – R.. GitHub STOP HELPING ICE Aug 31 '12 at 21:35
  • The relevant text is 7.19.6: *If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.* Just having the same representation is not enough; the standard requires the "correct type". – R.. GitHub STOP HELPING ICE Aug 31 '12 at 21:36
  • C also specifically says that the argument for a %s specifier must be a pointer to an element in a character array. (Which a null pointer is not) – nos Aug 31 '12 at 21:49
  • @R. it's hard to find a consensus on this. This is the same question as with `p` is the cast to `(void *)` is required when the argument is `(char *)`. Let's say with the `(void *)` cast, we are always on the safe side. – ouah Aug 31 '12 at 22:01
  • @R..: The `%s` format specifier doesn't actually *say* what type it requires, just that the argument must be "a pointer to the initial element of an array of character type". Again, probably a bug in the standard. Presumably all of `char *`, `const char*`, `unsigned char*` are permitted provided that they contain the address of such an element. A `void*` that contains the address of a character clearly is not a "pointer to char", but I'm not sure that it clearly isn't a pointer to an element in a character array. The precise wording of `printf` is silly. – Steve Jessop Sep 01 '12 at 00:39
  • 1
    Also, come to think of it, I'm not certain off-hand that a pointer to the *second* character in an array can clearly be said to be a pointer to the initial element of an array. Is there some umbrella text somewhere, that says "whenever any function talks about a pointer to the first element of an array, it's OK to give it a pointer to any element of an array and it will act as though the preceding part of the array doesn't exist", or "every contiguous subset of an array is an array", or some such? – Steve Jessop Sep 01 '12 at 00:41
  • Next question: is `printf("%p", (const void*)0);` defined behavior (defined other than the printed representation of a null pointer in the implementation, I mean)? The type of the argument is not "pointer to void", it is "pointer to const void", so presumably this is also UB even though the types have the same representation. – Steve Jessop Sep 01 '12 at 00:51
  • @SteveJessop: A subarray of an array is an array. I'm not sure that's explicitly stated anywhere but it seems clear from the representation of arrays. And about the `%s` format specifier not documenting the type it requires, yes this is a bug in the standard. Presumably `(int *)aligned_string` would be allowed by the letter of the standard as written, even on machines where `int *` has different size and representation than `char *`, which is of course basically impossible to support... – R.. GitHub STOP HELPING ICE Sep 01 '12 at 00:59
  • It's certainly clear without explicit statement, that a subarray of an array has the same representation as an array. Does that mean it "is" one? ;-) (In case it isn't obvious I don't strictly need an answer to that -- either it is, or else this is a defect in the specification of `%s` so glaring that no good-faith implementer could conceivably interpret it other than the way printf has always worked in every implementation. IMO "sequence of characters" would be better, to match the definition of a string in 7.1.1) – Steve Jessop Sep 01 '12 at 01:06
  • Somewhere the standard specifies that every object is an array of length one of its own type, so at least that provides one example of a certain subarray being an array. :-) – R.. GitHub STOP HELPING ICE Sep 01 '12 at 01:29
  • I remember it says that *for the purpose of pointer arithmetic*, an object is an array of length one (and so in particular you can form an off-the-end pointer for it), in the definition of additive operators. I don't know whether anything defines that it is *for the purpose of printf*, that might have to go without saying. – Steve Jessop Sep 01 '12 at 18:13
0

I believe it would compile just fine but behavior is undefined.

Something about how printf works and why it is considered to be unsafe. printf takes as many arguments, as you give it with only one (first one) being required. All the arguments (except for the first one - the pattern) are then treated as an array of bytes. It doesn't check types or anything. It simply prints.

Printing string is more complicated as it just goes on until it finds 0 byte ('\0'). To clarify, you can try testing it with integers. As you know, short is 2 bytes-long, long is 4 and long long is 8. If you told printf to print long and passed 2 shorts - it would treat them as one long. Or if you passed long long and told it to print long, it would take 4 first bytes and use them for printing.

With that in my these specific cases would probably (didn't test) print nothing but it is considered to be undefined behavior. If these values weren't 0s, it may print some characters if you passed some specific values which had a couple non-'\0's at the beginning.

Not quite sure if it helps but hope so.

Pijusn
  • 11,025
  • 7
  • 57
  • 76
0

From the online C11 draft:

7.21.6.1 The fprintf function

...
s If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many bytes are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.
280) No special provisions are made for multibyte characters.

Anything other than a pointer to the first element of an array of char containing at least 1 character (the 0 terminator) invokes undefined behavior.

If you're building your own implementation, you can certainly define your own behavior for 0 or NULL.

Oh, and as far as the definition of NULL is concerned:

6.3.2.3 Pointers

...
3 An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.66) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
66) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19

Basically, any 0-valued integer expression in a pointer context is considered a NULL pointer.

John Bode
  • 119,563
  • 19
  • 122
  • 198