30

Consider the following simplified code bellow. I want to extract some binary data/stream from a file and print it to the standard output in Hexadecimal format.

I got extra 3 bytes 0xFFFFFF. What's wrong? From where did the extra bytes come?

output

in:
        2000FFFFFFAF00690033005A00
out:
        2000FFFFFFAF00690033005A00

program.c

#include <stdio.h>
#include <stdlib.h>    

int main(int argc, char** argv) {

    int i;
    char raw[10] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};
    FILE *outfile;
    char *buf;

    printf("in:\n\t");
    for( i=0; i<10; i++ )
        printf("%02X", raw[i]);

    outfile = fopen("raw_data.bin", "w+b");

    fwrite(raw, 1, 10, outfile);

    buf = (char *) malloc (32 * sizeof(char));
    fseek(outfile, 0, SEEK_SET);
    fread(buf, 1, 10, outfile);

    printf("\nout:\n\t");
    for( i=0; i<10; i++ )
        printf("%02X", buf[i]);

    printf("\n");

    fclose(outfile);
    return 0;
}
user.dz
  • 962
  • 2
  • 19
  • 39
  • 3
    Use `unsigned char` because `0xAF > CHAR_MAX`. Plain `char` is for strings. – cremno Jun 27 '15 at 16:01
  • 2
    @cremno it actually depends on the system, plain `char` can be either signed or unsigned. – Ryan Haining Jun 27 '15 at 16:04
  • a lot of duplicates here already, although it's hard to find it right now – phuclv Jun 27 '15 at 16:47
  • 3
    Standard warning: Do not cast `void *` (e.g. given by `malloc()`) to other pointers! Also: `sizeof(char)` will never differ from `1`, as that is what it is defined to yield by the standard! – too honest for this site Jun 27 '15 at 16:57
  • @LưuVĩnhPhúc, yes i could find them by searching *printf variable promotion*. But :) I wasn't knowing about this topic. Thanks to everyone for help. – user.dz Jun 27 '15 at 16:58
  • 2
    If changing the type is not an option, just cast it: `printf("%02X", (unsigned char)buf[i]);` – KrisWebDev Jun 14 '16 at 18:46
  • 1
    [Why does printf not print out just one byte when printing hex?](http://stackoverflow.com/q/3555791/995714) – phuclv Nov 04 '16 at 09:22

3 Answers3

33

Sign extension. Your compiler is implementing char as a signed char. When you pass the chars to printf they are all being sign extended during their promotion to ints. When the first bit is a 0 this doesn't matter, because it gets extended with 0s.

0xAF in binary is 10101111 Since the first bit is a 1, when passing it to printf it is extended with all 1s in the conversion to int making it 11111111111111111111111110101111, which is 0xFFFFFFAF, the hex value you have.

Solution: Use unsigned char (instead of char) to prevent the sign extension from occurring in the call

const unsigned char raw[] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};

All of these values in your original example are being sign extended, it's just that 0xAF is the only one with a 1 in the first bit.

Another simpler example of the same behavior (live link):

signed char c = 0xAF; // probably gives an overflow warning
int i = c; // extra 24 bits are all 1
assert( i == 0xFFFFFFAF );
Ryan Haining
  • 35,360
  • 15
  • 114
  • 174
7

That's because 0xAF when converted from a signed character to a signed integer is negative (it is sign extended), and the %02X format is for unsigned arguments and prints the converted value as FFFFFFAF.

The extra characters appear because printf %x will never silently truncate digits off of a value. Values which are non-negative get sign extended as well, but that's just adding zero bits and the value fits in 2 hex digits, so printf %02 can do with a two digit output.

Note that there are 2 C dialects: one where plain char is signed, and one where it is unsigned. In yours it is signed. You may change it using an option, e.g. gcc and clang support -funsigned-char and -fsigned-char.

Jens
  • 69,818
  • 15
  • 125
  • 179
1

The printf() is a variadic function and its additional arguments (corresponding with ... part of its prototype) are subject to default argument promotions, thus char is promoted to int.

As your char has signed1, two's complement representation the most significant bit is set to one for 0xAF element. During promotion signed bit is propagated, resulting 0xFFFFFFAF of int type, as presumably sizeof(int) = 4 in your implementation.

By the way you are invoking undefined behaviour, since %X format specifier should be used for object of type unsigned int or at least for int with MSB that is unset (this is common, widely accepted practice).

As suggested you may consider use of unambiguous unsigned char type.


1) Implementation may choose between signed and unsigned represention of char. It's rather common that char is signed, but you cannot take it for granted for every other compiler on the planet. Some of them may allow to choose between these two modes, as mentioned in Jens's answer.

Community
  • 1
  • 1
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137