1

I'm getting unexpected results when loading a binary file in C.

FILE *bin = NULL;
unsigned long file_length = 0;

bin = fopen("vs.bin", "rb");
fseek(bin, 0, SEEK_END);
file_length = ftell(bin);
fseek(bin, 0, SEEK_SET);

char *buffer = (char *)malloc(file_length);
fread(buffer, 1, file_length, bin);

for(unsigned int i = 0; i < file_length; i++) {
    printf("%02x ", buffer[i]);
}
printf("\n");

What I see in the first eight values of output is this:

56 53 48 05 ffffffa4 ffffff8b ffffffef 49

But what I see when I open the binary in a hex editor is this:

56 53 48 05 A4 8B EF 49

What would cause this to happen? There are more instances of this happening throughout but I thought only sharing the first segment would suffice to illustrate the problem.

Thanks for reading.

flixilplix
  • 43
  • 3

2 Answers2

7

Change char *buffer to unsigned char *buffer. Also change %02x to %02hhx.

In your C implementation, char is signed. When you read data into a buffer of char, you have signed values. When you use them in an expression (including arguments to printf), some of them have negative values. Additionally, values narrower than int are generally promoted to int. At that point, the char value −92 (which is represented with bits 0xA4) becomes the int value −92 (which is represented with bits 0xFFFFFFA4, in your C implementation).

So you have negative values that are converted to int and then printed with %02x, and %02x shows all the bits of the int. (In %02x, 2 specifies the minimum width; it does not restrict the result to two digits.)

%hhx is a proper conversion specifier for an unsigned char. %x is for unsigned int.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

The format specifier %02x specifies the minimum number of digits to be printed out, not the maximum. The values a4, 8b and ef are all negative when interpreted as signed bytes, so what you're seeing is the two's complement representation of these values as 32-bit ints, which is what they're promoted to when passed to printf.

Explicitly name buffer as unsigned char or uint8_t to avoid this unintended sign-extension, and use the correct format specifier (%hhx for lowercase a-f hex digits, %hhX for uppercase).

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85