What causes 0xA4 to become 0xffffffa4 when reading a binary file?

Question

I'm getting unexpected results when loading a binary file in C.

FILE *bin = NULL;
unsigned long file_length = 0;

bin = fopen("vs.bin", "rb");
fseek(bin, 0, SEEK_END);
file_length = ftell(bin);
fseek(bin, 0, SEEK_SET);

char *buffer = (char *)malloc(file_length);
fread(buffer, 1, file_length, bin);

for(unsigned int i = 0; i < file_length; i++) {
    printf("%02x ", buffer[i]);
}
printf("\n");

What I see in the first eight values of output is this:

56 53 48 05 ffffffa4 ffffff8b ffffffef 49

But what I see when I open the binary in a hex editor is this:

56 53 48 05 A4 8B EF 49

What would cause this to happen? There are more instances of this happening throughout but I thought only sharing the first segment would suffice to illustrate the problem.

Thanks for reading.

I have sometimes found the compiler option `-funsigned-char` useful. — TDk, Feb 25 '19 at 22:18

Eric Postpischil · Accepted Answer · 2019-02-25T22:03:37.197

7

Change char *buffer to unsigned char *buffer. Also change %02x to %02hhx.

In your C implementation, char is signed. When you read data into a buffer of char, you have signed values. When you use them in an expression (including arguments to printf), some of them have negative values. Additionally, values narrower than int are generally promoted to int. At that point, the char value −92 (which is represented with bits 0xA4) becomes the int value −92 (which is represented with bits 0xFFFFFFA4, in your C implementation).

So you have negative values that are converted to int and then printed with %02x, and %02x shows all the bits of the int. (In %02x, 2 specifies the minimum width; it does not restrict the result to two digits.)

%hhx is a proper conversion specifier for an unsigned char. %x is for unsigned int.

edited Feb 25 '19 at 22:03

answered Feb 25 '19 at 21:57

Eric Postpischil

195,579
13
168
312

Isn't `%hhx` also proper for `signed char`? – Eugene Sh. Feb 25 '19 at 22:00
3

@EugeneSh.: There are language-lawyer questions there I do not want to go into here. – Eric Postpischil Feb 25 '19 at 22:02
Ugh! Of course! I'm happy to move past this and embarrassed I didn't catch it myself. Thanks, Eric. – flixilplix Feb 26 '19 at 03:16

score 0 · Answer 2 · answered Feb 25 '19 at 22:00

The format specifier %02x specifies the minimum number of digits to be printed out, not the maximum. The values a4, 8b and ef are all negative when interpreted as signed bytes, so what you're seeing is the two's complement representation of these values as 32-bit ints, which is what they're promoted to when passed to printf.

Explicitly name buffer as unsigned char or uint8_t to avoid this unintended sign-extension, and use the correct format specifier (%hhx for lowercase a-f hex digits, %hhX for uppercase).

What causes 0xA4 to become 0xffffffa4 when reading a binary file?

2 Answers2