11

I recently read that

unsigned char x=1;
printf("%u",x);

invokes undefined behaviour since due to the format specifier %u, printf expects an unsigned int. But still I would like to understand what is going on in this example.

I think that the integral promotion rules apply with the expression printf("%u",x) and the value represented by x.

A.6.1 Integral Promotion

A character, a short integer, or an integer bit-field, all either signed or not, or an object of enumeration type, may be used in an expression wherever an integer may be used. If an int can represent all the values of the original type, then the value is converted to int; otherwise the value is converted to unsigned int. This process is called integral promotion.

What does "may be used" mean here? Does it mean 'is syntactically correct' or 'is defined behaviour'?

And how is x promoted in this example? I have read that it is promoted to an int, but if printf("%u", (int x)) is still undefined behaviour then I don't really understand why...

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
lee77
  • 1,493
  • 3
  • 10
  • 14
  • 1
    I think the behavior is indeed defined, for the very reasons you mention. – luser droog Oct 18 '13 at 08:28
  • 1
    [Recommended viewing](http://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-/Stephan-T-Lavavej-Core-C-7-of-n) – Kerrek SB Oct 18 '13 at 09:08
  • 1
    @luserdroog: So you think the "may be used" means that the behaviour should be defined? Or am I missing your point? – lee77 Oct 18 '13 at 09:09
  • @lee77 Yes. You are correct. "may be used" means it *is* allowed. It is guaranteed by the standard to work correctly. Good job thinking critically. Where did you find the claim that it was undefined? – luser droog Oct 18 '13 at 09:30
  • @luser droog: I found it in the penultimate answer to a question on stackoverflow (scroll down a little): http://stackoverflow.com/questions/15736497/print-unsigned-char-in-c – lee77 Oct 18 '13 at 09:41
  • Ah. What he says is true if it's just `char` and the implementation is implicitly *signed*. Here you say `unsigned char` so it's all within range. – luser droog Oct 18 '13 at 09:50

3 Answers3

4

Since printf uses a variable argument list, the integer promotions are applied to its integer arguments. In any normal C implementation, the integer promotions convert an unsigned char to an int. Then you are formatting an int With a specifier for unsigned int, so the behavior is undefined.

There is no conflict between saying that a character may be used where an integer may be used and the fact that your statement has behavior not defined by the C standard. Although you may use a character in place of an integer, the rules about what may be printed with %u still apply. If using a character results in an integer appropriate for the specifier, the behavior is defined. If using a character results in an integer inappropriate for the specifier, the behavior is not defined by the C standard.

Discussion elsewhere on Stack Overflow concluded that an exotic C implementation might in theory conform to the C standard while having char types (plain, signed, and unsigned) as wide as int types. In such an implementation, and int could not represent all values of an unsigned char, so an unsigned char would have to be promoted to an unsigned int. However, such an implementation would be exotic and troublesome (notably with handling EOF), and you may ignore it in practice.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • So there are two steps: First, the integer promotion takes place since x represents an integer and printf uses a variable argument list. Now there are two possibilities: 1) int cannot represent all the values that an unsigned char can (which would be rather exotic). Then x is promoted to an unsigned int, which is defined. 2) int can represent all the values that an unsigned char can, so x is promoted to an int. In this case, wouldn't there be (within printf, while resolving the va_arg macro) a definition `unsigned int = `? Which would be defined as well? – lee77 Oct 18 '13 at 10:50
  • FYI, the "discussion elsewhere on SO" is here: http://stackoverflow.com/questions/4664100/does-printfx-1-invoke-undefined-behavior – R.. GitHub STOP HELPING ICE Oct 18 '13 at 10:55
  • 1
    @lee77: No, there is not necessarily any such assignment. When the `printf` implementation sees the `%u` specifier, it will attempt to get the bytes for an `unsigned int` argument. The means by which it does that are up to the implementation. It does not need to be through a C assignment expression. It can be pure assembly or machine language. The only rule you can rely on is this: If you use an `unsigned int` specifier, you must pass an `unsigned int`. – Eric Postpischil Oct 18 '13 at 11:41
  • @Eric: Okay, now I understand. Thanks a lot for making things clear. – lee77 Oct 18 '13 at 12:05
  • 1
    @EricPostpischil: Aren't `int` and `unsigned int` guaranteed to be layout-compatible? – Kerrek SB Oct 18 '13 at 12:09
  • @KerrekSB: They are not guaranteed to be passed in the same place when a function is called. (C 2011 [N1570] 6.9.1 9: “The layout of the storage for parameters is unspecified.”) And you are not guaranteed that the optimizer will not recognize that the behavior is undefined and will respond by eliminating the code or substituting whatever it finds convenient. It is unlikely a normal C implementation would pass `int` and `unsigned int` differently, but the optimizer is less predictable. – Eric Postpischil Oct 18 '13 at 12:40
  • @KerrekSB: There was a question yesterday asking about `1 << -1`, so I compiled `#include `, `int main(void) { printf("%d.\n", 1 << -1); return 0; }` with Apple clang 5.0 for x86_64 with -O3 and looked at the assembly. Guess what clang put in the register for the second `printf` argument, `%rsi`? Nothing! The assembly did nothing at all with the register. The optimizer recognized the expression was undefined and simply generated no code whatsoever for it. – Eric Postpischil Oct 18 '13 at 12:46
  • For the same reason, the line `printf("%u",1)` yields undefined behaviour: 1 is promoted to an int (since int can always represent 1), so the situation is as before. Or am I missing something here? It's really astonishing how easy it is to produce undefined behaviour. In future I will code it like this: `printf("%u", (unsigned int)1)` – lee77 Oct 19 '13 at 09:30
  • @lee77: That is correct. The ease of encountering undefined behavior in certain situations is a deficiency of the language. – Eric Postpischil Oct 19 '13 at 10:27
  • Out of curiosity, has there ever actually been any production compiler where neither `int` nor `unsigned int` had any padding bits, where `int` was passed in a fashion sufficiently different from `unsigned int` that `%X` wouldn't "naturally" work? If not, is there any reason why the Standard shouldn't define `%X` to be usable on either `int` or `unsigned int`? – supercat Jul 06 '15 at 22:09
  • @EricPostpischil: The case of `signed int` vs `unsigned int` mismatch in varargs is specifically permitted: "If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases: — **one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;** — one type is pointer to void and the other is a pointer to a character type." – Ben Voigt Sep 11 '15 at 18:41
1

If your platform's int can represent all values than an unsigned char can, then the promotion is to int, otherwise to unsigned int. So it depends on your platform.

As to "why", that's because you're passing x as a variable argument, and the rules of variable arguments say that the standard promotions take place (presumably so as to simplify the implementation).

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
1

Since printf uses a variable argument list, it will be unpacked via va_arg. C++ refers to the C standard for va_arg rules. The C99 Standard says the following:

The va_arg macro expands to an expression that has the specified type and the value of the next argument in the call. The parameter ap shall have been initialized by the va_start or va_copy macro (without an intervening invocation of the va_end macro for the same ap). Each invocation of the va_arg macro modifies ap so that the values of successive arguments are returned in turn. The parameter type shall be a type name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

  • one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
  • one type is pointer to void and the other is a pointer to a character type.

Clearly, integer promotions are taken into account when determining whether the actual and expected type match. And signed vs unsigned mismatch is covered by the first bullet point.

Since x = 1 is certainly a value representable by unsigned int, and promotion of unsigned char generates either signed int (if INT_MAX >= UCHAR_MAX) or unsigned int (if INT_MAX < UCHAR_MAX), this is perfectly legal.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720