Why does the binary value of my char change when using %u?

Question

char a = 0b11111111;

printf("%u", a);

We are storing in a signed char (in gcc default char is signed) 1111 1111, meaning -1. But we print with %u, so printf should see 0000 0000 0000 0000 0000 0000 1111 1111.

This number, with two's complement or without - is 255. So why am I getting (2^32 - 1)? Seems like instead of putting leading zeros (like I expected) the program put leading ones.

Type promotion. `(char) -1` is promoted to `(int) -1`, and it’s the latter which is passed to `printf`. — user3840170, Mar 26 '21 at 14:30
"printf should see 0000 0000 0000 0000 0000 0000 1111 1111." --> No. It is UB to print -1 with `"%u"`. — chux - Reinstate Monica, Mar 26 '21 at 14:35
Related: https://stackoverflow.com/questions/27547377/format-specifier-for-unsigned-char — Bob__, Mar 26 '21 at 14:35
Well, much as I like the dupe target, it's actually not a dupe. Because the problem here is regarding impl.defined signedness of char and default argument promotions of printf. Neither were addressed by me in the linked dupe. I'll vote to re-open. — Lundin, Mar 26 '21 at 14:37
More relevant may be: https://stackoverflow.com/q/7084857/1216776 — stark, Mar 26 '21 at 14:38
An interesting case is `printf("%c", (char)'A');`. The `%c` specifier wants an `int`. The `char` argument would normally get promoted to `int`, but may get promoted to `unsigned int` on some implementations (where `CHAR_MAX > INT_MAX`). But it seems kind of ridiculous that you cannot use a `char` argument here unless there is some unwritten rule that `CHAR_MAX <= INT_MAX`. — Ian Abbott, Mar 26 '21 at 14:54
ASIDE: Assuming the non-standard `0b11111111` is exactly equivalent to `0xff`, and that `CHAR_MAX` is 127, then the initialization `char a = 0x11111111;` initializes `a` to an implementation-defined value, not necessarily -1, or it raises an implementation-defined signal. The compiler might produce a warning. — Ian Abbott, Mar 26 '21 at 15:07
Signed types are expanded to larger size with copy of MSB, not zeros. — i486, Mar 26 '21 at 15:24
@user3840170 That's what I suspected, but I'm having trouble understanding WHY this promotion happens. From what I read [Here](https://wiki.sei.cmu.edu/confluence/display/c/INT02-C.+Understand+integer+conversion+rules), integer promotion happens when you use an operator on chars and shorts. But I didn't perform an operation here. So in what point exactly did my char got promoted? — sadcat_1, Mar 26 '21 at 15:58
When you call a variadic function like `printf()`, parameters passed as part of the `...` (ellipsis) undergo [default argument promotions](http://port70.net/~nsz/c/c11/n1570.html#6.5.2.2p6). That means that `char` is promoted to `int`, and a negative `char` value (when plain `char` is a signed type) is promoted to a [negative integer](http://port70.net/~nsz/c/c11/n1570.html#6.3.1.1). — Jonathan Leffler, Mar 26 '21 at 16:33

score 2 · Answer 1 · edited Mar 26 '21 at 16:13

There are multiple problems in your example:

char a = 0b11111111; uses an extension for binary literals.
char a = 0b11111111; has implementation defined behavior if char is signed and CHAR_MAX < 255.
printf("%u", a) has undefined behavior because the char value a is promoted to int when passed to printf, which expects an unsigned int for the format %u.

One exception is the rare architectures (mostly DSPs) where char is unsigned by default and has the same size as unsigned int. But then char is not signed and your example does not pose a problem.

If you want to print the exact value of type unsigned char, you should use %hhu, or use %u and cast the argument as (unsigned char)

score 0 · Answer 2 · answered Mar 26 '21 at 14:58

0

As @Lundin stated conversion rules apply

to print correctly:

    printf("%hhu\n", (unsigned char)a);
    printf("%u\n", (unsigned char)a);
    printf("%"PRIu8"\n", (uint8_t)a);

answered Mar 26 '21 at 14:58

0___________

60,014
4
34
74

`printf("%u\n", (unsigned char)a);` is not correct because it passes an `int` (after promotion) for `%u`, which expects `unsigned int`, and C 2018 7.21.6.1 9 says “… If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.” – Eric Postpischil Mar 26 '21 at 15:09
@EricPostpischil "one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;" § 6.5.2.2 6 seems to exempt that case when the value is positive (represented the same as both `unsigned` and `int`). – chux - Reinstate Monica Mar 26 '21 at 15:18
@chux-ReinstateMonica: That is a general rule for function calls, and it only applies to calling functions that are defined without a prototype (and not for variable arguments with `...`). (We do not know how `printf` is defined; it is specified by the standard library clause, which does not say how it is defined, just how it behaves and how it is declared by the headers.) The rule in 7.21.6.1 9 is specifically for `printf`/`fprintf`. I expect it may have been unintentional that 7.21.6.1 9 makes this behavior undefined when it might otherwise have been defined, but that is what it says. – Eric Postpischil Mar 26 '21 at 15:21
@EricPostpischil if sizeof(int) > sizeof(char) then this cast is OK – 0___________ Mar 26 '21 at 15:38
@0___________: The cast from `a` to `unsigned char` is okay. The conversion caused by the promotion from `unsigned char` to `int` is okay. The passing of an `int` argument for a conversion specification of `%u` is not okay. `%u` should be passed an `unsigned int`, and the text I quoted from 7.21.6.1 9 means that, when an `int` is passed for an conversion specification expecting an `unsigned int`, the behavior is not defined. – Eric Postpischil Mar 26 '21 at 15:56

score -1 · Answer 3 · answered Mar 26 '21 at 14:44

-1

First of all, char may be signed or unsigned depending on compiler and is therefore unsuitable for storying raw binary, see Is char signed or unsigned by default?.

In your case it is apparently signed, in which case the value 0b11111111 = 255 won't fit. Upon assignment, 255 will get implicitly converted to the signed char in a compiler-specific way. Very likely as the 2's complement number -1.

Now as you pass any small integer type to printf, they get implicitly promoted by an oddball rule called default argument promotions, which applies to all variable number of argument functions. This rule goes:

the integer promotions are performed on each argument, and arguments that have type float are promoted to double

For the meaning of "small integer type" and integer promotion, please check Implicit type promotion rules.

This means that what's passed to printf is an int, still with value -1 but sign extended. Since int is likely something like 4 byte, it now contains the raw binary 0xFFFFFFFF instead of just 0xFF. Which is still the decimal number -1, just a larger type.

Then finally you tell printf to print it as unsigned int, so it gets converted by printf to the unsigned representation of 0xFFFFFFFF. This is a well-defined conversion (C17 6.3.1.3):

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

And you will end up with 2^32 - 1 = 4294967295 on a 32 bit int computer.

answered Mar 26 '21 at 14:44

Lundin

195,001
40
254
396

2

This answer mixes up the conversions and reinterpretations. When an `int` is passed to `printf` for `%u`, `printf` cannot know there is an `int`, because it is not given that information. So it cannot perform any conversion from `int` to `unsigned`. So there is no “well-defined conversion.” The behavior of passing an incorrect argument type for `printf` is undefined by the C standard. What often happens is the bits are reinterpreted in the new type. That is not a conversion. – Eric Postpischil Mar 26 '21 at 14:50
On the other hand, when the `char` −1 is converted to `int`, that is a conversion, and it is specified by rules about the value, not about sign extension. An implementation may use sign extension to implement the rules, but that is not the rule to teach people. – Eric Postpischil Mar 26 '21 at 14:52
Really good answer up to "Then finally you tell ..." – chux - Reinstate Monica Mar 26 '21 at 15:02
@EricPostpischil No, you are wrong. What happens is that `printf` essentially get a pointer to an unknown chunk of data and the programmer labels that `%u`. `printf` then have to perform a lvalue access of that location, essentially `*(unsigned int*)&data`. However, this unknown data has _effective type_ int. It is safe to reinterpret that data through the unsigned type corresponding to the effective type of the object. All of this assuming that `printf` is executed on a real-world computer, though any of the real-world C lib implementations that exists. – Lundin Mar 26 '21 at 15:03
If the compiler lib implementation has the ambition to be an ISO compliant bug collection useless for real-world purposes, then it's another story... – Lundin Mar 26 '21 at 15:04
2

*Then finally you tell printf to print it as unsigned int, so it gets converted by printf to the unsigned representation of 0xFFFFFFFF.* This is not what happens. `printf()` expects an `unsigned int` argument and does not convert from the `int` that was actually passed. It retrieves the argument from wherever it was passed as if an `unsigned int` were passed. The C Standard mandates that the value retrieved be identical if the `int` value is non negative, but the behavior is at best implementation defined if not. On a hypothetical sign/magnitude machine, `printf` might output `32769`. – chqrlie Mar 26 '21 at 15:04
@chqrlie In practice it does a lvalue access to the data with a pointer type corresponding to the format specifier. Which is fine, we may access a signed `int` through an `unsigned int` pointer. – Lundin Mar 26 '21 at 15:06
2

@Lundin: I agree, but the conversion semantics of 6.3.1.3 do not apply to this case. – chqrlie Mar 26 '21 at 15:07
1

First, whether it is safe to reinterpret the data is irrelevant to the fact that it is a reinterpretation of the bits, not a conversion. The statement “it gets converted by `printf`” is false. The bits get reinterpreted; the value is not converted. Second, the reinterpretation may be safe according to the aliasing rules, but that does not make the reinterpretation safe, because `printf`/`fprintf` has an explicit rule that “If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.” – Eric Postpischil Mar 26 '21 at 15:08
@chqrlie "...lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called _lvalue conversion_." An lvalue conversion isn't a conversion? – Lundin Mar 26 '21 at 15:09
@EricPostpischil Again, this is under the assumption that the compiler lib author didn't decide to add bugs to the library just because chapter 7 says its undefined behavior to muck up the format string. That they were a sane person concerned with writing useful software and not a useless language lawyer. Now you can go ahead link me the source from glibc or any other real-world C library implementation that does not read the passed parameter to `printf` through a lvalue conversion. – Lundin Mar 26 '21 at 15:15
1

@Lundin: an lvalue conversion has nothing to do with the point. you pass an `int` value and `printf` might have use an `unsigned int` lvalue that gets converted to an `unsigned int` value. No arithmetic conversion occurs during this process, merely the reinterpretation of bits of the representation, and only because `int` and `unsigned int` are passed in a compatible way (same registers, stack location...), which is not even guaranteed by the C Standard. A good example of UB is passing an `int` where a `double` is expected. The output string may not even be related to the `int` value. – chqrlie Mar 26 '21 at 15:15

Why does the binary value of my char change when using %u?

3 Answers3