The %x format specifier with an unsigned char in C

Question

I ran across the following example program and I don't exactly understand it's output:

#include <stdio.h>

int main( void ) {

    unsigned char i, m =0xFF, n=0x1;

    for ( i = 0; i != 8; i++,n+=n, m/=2 )
        printf("%5x %5x %5x %5x %5x %5x\n", n,m,n&m,n|m,n^m,~n);

    return 0;
}

It prints out:

    1    ff     1    ff    fe fffffffe
    2    7f     2    7f    7d fffffffd
    4    3f     4    3f    3b fffffffb
    8    1f     8    1f    17 fffffff7
   10     f     0    1f    1f ffffffef
   20     7     0    27    27 ffffffdf
   40     3     0    43    43 ffffffbf
   80     1     0    81    81 ffffff7f

The problem is that last column. Since it's unsigned char I would expect it to print out just 2 places in every column. ~n produces an unsigned char as it's result, but it seems like it's being cast to a signed 32 bit value and sign extended by the %5x specifier.

How is that possible, what's going on here?

`n` is a `unsigned char`; `~n` is a `int` (or maybe an `unsigned int` -- too lazy to check now). Try `printf("%d %d\n", (int)sizeof n, (int)sizeof ~n);` :-) — pmg, Aug 25 '11 at 10:48
Not really. The `&`, `|`, `^`, and `~` operations promote their operands to `[unsigned] int` and result in a value of type `[unsigned] int`. — pmg, Aug 25 '11 at 11:05
@pmg: Yeah, I just looked it up in K&R, pg. 197 - 198, section A6.5 Arithmetic Conversions: ... Otherwise, both operands have type int. — Robert S. Barnes, Aug 25 '11 at 11:08
@pmg: It's `int` if that type can hold all the values of `unsigned char` (most likely), otherwise it's `unsigned int`. — caf, Aug 25 '11 at 12:59

Dietrich Epp · Accepted Answer · 2011-08-25T10:54:43.523

10

Integer types are promoted when they are used in arithmetic operations (this has nothing to do with printf, by the way).

So, for example,

unsigned char x = 0xff;
int y = ~x; // x is promoted to 0x000000ff, then changed to 0xffffff00
unsigned char z = ~x; // truncated back to 0x00

Integer promotion causes various problems:

unsigned char x = 1;
if (x << 8)
    puts("x << 8 is true"); // does print
x <<= 8;
if (x)
    puts("x <<= 8 is true"); // does not print

The two ways to truncate things are casting and masks. Use whatever you prefer.

unsigned char x = 0xab;
printf("x = %02x\n", (unsigned char) x);
printf("x = %02x\n", x & 0xff);

Integer promotion doesn't always happen, and it's not the only kind of implicit cast. It's also a bit subtle and the exact rules are difficult to remember. You only really need to worry about it if you're working with 64-bit numbers, because 1U << 32 could end up being 0 or 1 or something else entirely. (It's often 1 on x86).

edited Aug 25 '11 at 10:54

answered Aug 25 '11 at 10:44

Dietrich Epp

205,541
37
345
415

But why aren't values in other cols where there is a leading 1 being sign extended? – Robert S. Barnes Aug 25 '11 at 10:47
Sign extension is never performed on an `unsigned char`. The leading one bits are caused by the complement operator `~`. – Dietrich Epp Aug 25 '11 at 10:50
Does this also happen with regular arithmetic operations? By the way, I thought shifting by more than the width of the type is an undefined operation. – Robert S. Barnes Aug 25 '11 at 11:01
@Robert: That's exactly why `1U << 32` is a problem — it should probably be `1ULL << 32`. People forget that `1` is usually 32 bits wide (except in the preprocessor...). Yes, it does happen with regular arithmetic operations. – Dietrich Epp Aug 25 '11 at 11:07
1

+1 For the only correct answer. – Lundin Aug 25 '11 at 11:19

Rup · Answer 2 · 2011-08-25T11:48:06.027

2

The issue is that it's promoted to an int ~~when you pass it into printf's varags~~ - as in Dietrich's answer - when you negate it.

Unfortunately you'll need to strip it down to a byte to pass in, i.e. (~n & 0xff).

edited Aug 25 '11 at 11:48

answered Aug 25 '11 at 10:40

Rup

33,765
9
83
112

But why aren't other instance with a leading 1, like FF in the first row second col, being promoted and sign extended also? – Robert S. Barnes Aug 25 '11 at 10:43
The problem is not that `unsigned char` promoted when passed to `printf`, it's that it's promoted before taking the bitwise complement. You could remove `printf` and observe the same values. – Dietrich Epp Aug 25 '11 at 10:45
@Robert What do you mean by sign extended? What result did you expect? – Šimon Tóth Aug 25 '11 at 10:45
@Dietrich Epp: Are you saying that bitwise complement always promotes to int from shorter types? Why would that be different from the other bitwise operators? Or are you saying they all convert from shorter types to long? Effectively you're say that it's doing something like `~(unsigned int)n`. If I were to do something like `n = ~n;` would I get some kind of truncation warning? – Robert S. Barnes Aug 25 '11 at 10:54
2

@Robert: All of them are being promoted to `int` (not `unsigned int`). You just don't notice it since e.g. `0x000000ff ^ 0x00000001 = 0x000000fe`. C compilers do not generally give truncation warnings since you'd get a zillion false alarms, although Clang will give you a warning when you cast a pointer to an integer type that is too small. – Dietrich Epp Aug 25 '11 at 11:01
Almost every operator in the C language is subject to _the integer promotion rules_. In fact, it is impossible to perform any form of arithmetic on 8-bit variables in C, they will always be promoted to larger integer types. So this answer seems incorrect, if I remember correctly, the va_arg used by printf() is a macro- and then no implicit promotions take place there. The promotion is caused by using an expression containing operators. – Lundin Aug 25 '11 at 11:17
@Dietrich D'oh yes. Well my fix was right if my reasoning wasn't. – Rup Aug 25 '11 at 11:47

score 0 · Answer 3 · answered Aug 25 '11 at 10:42

0

My guess is that all your passed chars are interpreted as 32 bit integers, but, in the first 4 cases the output is the same. Only in the last one it shows it's ugly face :)

You have to mask out all other bits in order to gain a character like result as in the first 4 examples.

answered Aug 25 '11 at 10:42

Constantinius

34,183
8
77
85

But why the difference? Shouldn't all the values with a leading 1, like FF in the first row second col, be extended as well? Why only in the last col? – Robert S. Barnes Aug 25 '11 at 10:44
because with 32 bits: `~OxFF` is equal to `OxFFFFFF00`, so there are non zero values in the first 3 bytes. Viewing `OxFF` only the last byte in the integer is set. – Constantinius Aug 25 '11 at 10:48
@Robert They are all extended. What values did you expect? – Šimon Tóth Aug 25 '11 at 10:49

score 0 · Answer 4 · answered Aug 25 '11 at 18:06

1) You have specified width as %5x but there is a rule regarding width specifier in C. If the width of the number to be printed is more than the assigned width, the width specifier is ignored. So in the last column the number content cant be represented in the assigned width 5 so width limit has been ignored.Why width is greater than 5 i have explained in 3rd point. Now in 2nd point i tell %x vs unsigned char .

2) See friend you have declared variable 'n' as an unsigned char but you have used %x which is for unsigned hexadecimal int. So at the time of printing the value of n is promoted or technically you can say typecasted as an unsigned hexadecimal integer. It is not only in the case of unsigned char and %x. You should try dirrerent combinations on your compiler just like int a=-5; printf("%d %u",a,a); now you will get -5 by %d and for %u you will get some other interpretaion dependent on compiler properties whether it is 16-bit or 32-bit. now if you try unsigned int a=-5; printf("%d %u",a,a); result will be still same as previous. in first case you write int a=5 and in the second case you write unsigned int =5 but result is dependent on the interpretation so this is like such.

This is about typecasting or you can say about interpretation by the compiler. when you say %x You yourself say to compiler that interpret it as a unsigned hexadecimal int. i hope its clear now, in case its not i suggest you to run little different programs and that too on many compilers. Programs like int x=7; printf("%f "x); etc etc .. you will surely get the point. Now i explain the out of the last column why its like fffffffe.

3) We consider the first run of the loop. here n=0x1. In a 32-bit compiler it will be represented in memory as 0000 0001 as char is provided 1 byte of memory.But afterwards when you typecast it as a unsigned hexadecimal integer the interpertation in a 32-bit compiler is 0000 0000 0000 0000 0000 0000 0000 0001 Well if you declare int n=1 or you declare unsigned int n=1 or you declare int n=0x1. the representation will be same. even if you use unsigned char n=1 the rightmost digit will be 1 and all others zero although the number of zeros may be less. Now in your looping statement by denoting %x you tell compiler to interpret the content as a unsigned hexadecimal int, so compiler provides it space as an hexadecimal and you do this opertion "~n" now by performing this operation the bit representation becomes like this 1111 1111 1111 1111 1111 1111 1111 1110. (caution-Remember that printf("%d",~n); case is different than printf("%d",n++); in the n++ case the value of variable in the memory gets updated too. but using printf("%d",~n) is similar using printf("%d",n+8).) So as per this new representaion of bits all bits are 1 except the rightmost. Now when it gets printed it gets printed like fffffffe. simple!

4)You have written " but it seems like it's being cast to a signed 32 bit value and sign extended by the %5x specifier." hmmm %x is not for signed hexadecimal int %x is for unsigned hexadecimal int so no point of assuming that minus sign got truncated.

5)In your progam you have used '~n' now just check different versions like this '-~n' or '-n' . this is for experiment purpose.

The %x format specifier with an unsigned char in C

4 Answers4