8

Assuming the following:

sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8

The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?

CHRIS
  • 957
  • 3
  • 10
  • 27

1 Answers1

7

what is the correct format for a 1 byte signed number?

%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:

hh

Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…

The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.

However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.

In case that wasn't clear, a simple example (but controversial) example:

/* Read the comments thread to this post; I'll remove
   this note when I edit the outcome of the discussion into
   the answer
 */

#include <stdio.h>
int main(void) {
  char* s = "\u00d1"; /* Ñ */
  for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
  return 0;
}

Output:

$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)

In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."

However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)

So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.

Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).

Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)

So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.

For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:

the int argument is converted to an unsigned char, and the resulting character is written.

If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:

void f(char c) {
  printf("This is a '%c'.\n", c);
}

was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.

The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.

I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".

I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • What is the purpose of the "but its value shall be converted to signed char or unsigned char before printing"? What does this extra conversion accomplish? (Also see my comment addressed to Joachim under the question.) – Mike Nakis Feb 07 '15 at 22:10
  • 1
    @MikeNakis: I was editing the question to answer that, in the expectation that someone would want to know. Let me know if the edit helps. – rici Feb 07 '15 at 22:12
  • Yes, it helps, thank you! (And yes, of course, why didn't I think of it!) – Mike Nakis Feb 07 '15 at 22:13
  • This gives two warnings in GCC: warning: unknown conversion type character 'h' in format warning: too many arguments for format but works. – CHRIS Feb 07 '15 at 22:31
  • @CHRIS: `--std=c99` or `--std=c11` – rici Feb 07 '15 at 22:35
  • Using C99 or C11 still gives the warnings. – CHRIS Feb 07 '15 at 22:38
  • Using your exact example still gives me warnings in GCC with C99 or C11. – CHRIS Feb 07 '15 at 22:53
  • @CHRIS: which gcc version? – rici Feb 07 '15 at 22:55
  • @rici I remember now, `%hhu` has practical value on systems with `sizeof(int) == 1` – M.M Feb 07 '15 at 22:56
  • @CHRIS update update your question to include the code that gives warnings – M.M Feb 07 '15 at 22:57
  • @MattMcNabb: yeah, that's what I just wrote in a comment to gio. I'm going to reword that comment into the answer. – rici Feb 07 '15 at 22:58
  • @CHRIS: I compiled it fine with `gcc -Wall -Wextra -pedantic -std=c99 ` using gcc 4.8.2. ideone claims to have 4.9. So you need to be more specific about your environment compiler version and compiler options. – rici Feb 07 '15 at 22:59
  • @rici Your example is demonstrating UB (plain char is signed on your system but you are using an unsigned specifier `X`) -- including `hh` doesn't remove the UB – M.M Feb 07 '15 at 23:02
  • this is all such a horrible mess, the standard has never been clear – M.M Feb 07 '15 at 23:04
  • @MattMcNabb: You may well be right, in which case your description of it as "technical" was justified. Afaics, the only case in which you would invoke UB (as opposed to unspecified/implementation-defined) is the bizarrely legal case where unsigned types have the same number of magnitude bits as signed types and the existence of a 1 bit in the sign position makes an unsigned value a trap representation. (Or in other words, unsigned types are the same as signed types, except they trap if negative.) But I could check for that case using `limits.h`. – rici Feb 07 '15 at 23:16
  • @MattMcNabb: anyway, although it would be fun to chat more language-lawyer to language-lawyer, I have to go do some errands. Grab me later if you feel like it. – rici Feb 07 '15 at 23:17
  • @MattMcNabb: OK, first draft of argument that it's not UB is in the post. – rici Feb 08 '15 at 01:11
  • @rici "one would be forced to also argue" - yes that is technically UB, but we pretend it isn't for pragmatic reasons. – M.M Feb 08 '15 at 01:20
  • @MattMcNabb: Why is it UB? The `char` is promoted to `int`, and then converted to `unsigned char`. `char c = ...; int i = c; unsigned char d = i;` is completely legal, no? – rici Feb 08 '15 at 01:25
  • @rici C11 7.21.6.1/8 says that the *argument* must be an `int`, even for `%c`. It doesn't say "the argument (after apply the default argument promotions)" or anything similar. 3.3/1 defines "argument" as the expression that appears in the function call, which is a `char` in this case. – M.M Feb 08 '15 at 02:10
  • @MattMcNabb: That's the crux of the discussion. The wording in the standard seems to me to imply that the default argument promotions *change the argument*, rather than being some kind of thing intermediate between an argument and a parameter. For example, in 7.6.1.1/2, "if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions)" (which seems relevant since that's about variadic functions). Arguably it's a lacuna in the standard, but I feel that it cannot be right that almost every significant program be "technically" UB. – rici Feb 08 '15 at 02:36
  • @rici I agree that it should be considered a defect – M.M Feb 08 '15 at 03:24