0

So I'm using the following code to put an integer into a char[] or an unsigned char[]

(unsigned???) char test[12];

test[0] = (i >> 24) & 0xFF;
test[1] = (i >> 16) & 0xFF;
test[2] = (i >> 8) & 0xFF;
test[3] = (i >> 0) & 0xFF;

int j = test[3] + (test[2] << 8) + (test[1] << 16) + (test[0] << 24);

printf("Its value is...... %d", j);

When I use type unsigned char and value 1000000000 it prints correctly.

When I use type char (same value) I get 98315724 printed?

So, the question really is can anyone explain what the hell is going on??


Upon examining the binary for the two different numbers I still can't work out whats going on. I thought signed was when the MSB was set to 1 to indicate a negative value (but negative char? wth?)

I'm explicitly telling the buffer what to insert into it, and how to interpret the contents, so don't see why this could be happening.

I have included binary/hex below for clarity in what I examined.

11 1010 1001 1001 1100 1010 0000 0000 // Binary for 983157248

11 1011 1001 1010 1100 1010 0000 0000 // Binary for 1000000000

3 A 9 9 C A 0 0 // Hex for 983157248

3 B 9 A C A 0 0 // Hex for 1000000000

Blue42
  • 355
  • 2
  • 4
  • 13
  • C doesn't really have "characters"; it only has small integers. `char` is just the name for the smallest integer type, with a minimum range of `[-127, 127]` (if signed) or `[0, 255]` (if unsigned), and (unfortunately) whether or not it is signed is implementation-defined. For this use case you *must* use `unsigned char`. – zwol Jun 06 '13 at 18:12
  • 1
    Also, to avoid triggering undefined behavior when you shift something into `int`'s sign bit, you need to do the reassembly in `uint32_t` and cast to `int` afterward. – zwol Jun 06 '13 at 18:14

3 Answers3

0

When you say i & 0xFF etc, you're creaing values in the range [0, 256). But (your) char has a range of [-128, +128), and so you cannot actually store those values sensibly (i.e. the behaviour is implementation defined and tedious to reason about).

Use unsigned char for unsigned values. The clue is in the name.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • I think the name part doesn't apply here, since an unsigned int still won't fit even if you use unsigned chars (unless when you still use &0xff, but then you can't have all the values your int can represent). – Kevin Jun 06 '13 at 18:09
  • This isn't the problem the OP is facing, although shoving `unsigned` things into `signed` boxes is not a good idea to begin with. The issue is the left shifting of a signed type. – Nik Bougalis Jun 06 '13 at 18:12
  • I thought 0xFF just took the last 8 bits off the integer. I don't think thats quite right though because memcpy(&test,&i, 4); doesn't do the shift but still has the same problem – Blue42 Jun 06 '13 at 18:15
  • Nik, your answer sounds pretty sensible!! When you shift a signed type do you use the same value as the MSB?? I know extending a 16bit signed value to a 32 bit you just copy the MSB. Also, as all my values will be positive I can just use unsigned int then I guess. – Blue42 Jun 06 '13 at 18:16
  • wait, unsigned int makes no sense, need to type the array properly >.> – Blue42 Jun 06 '13 at 18:24
0

In addition to the answer by Kerrek SB please consider the following:

Computers (almost always) use something called twos-complement notation for negative numbers, with the high bit functioning as a 'negative' indicator. Ask yourself what happens when you perform shifts on a signed type considering that the computer will handle the signed bit specially.

You may want to read Why does left shift operation invoke Undefined Behaviour when the left side operand has negative value? right here on StackOverflow for a hint.

Community
  • 1
  • 1
Nik Bougalis
  • 10,495
  • 1
  • 21
  • 37
  • Interestingly I can insert values as I did previously and chuck an (ugly) cast in at the last minute i.e. unsigned char* tmp = (unsigned char*)test; then use tmp in the print. Thanks though, this really helped =) – Blue42 Jun 06 '13 at 18:29
  • Right - because the cast tells the compiler how to interpret the value. The computer doesn't store a `signed` value any differently than an `unsigned` (although, *in principle*, it could). The only difference is how the computer **manipulates** the data. For example, there are different instructions to perform a signed vs. an unsigned multiplication. Both with either version, the data is stored in the same registers or just generic memory. Not in "special" `signed` or `unsigned` registers or memory. – Nik Bougalis Jun 06 '13 at 19:46
0

This all has to do with internal representation and the way each type uses that data to interpret it. In the internal representation of a signed character, the first bit of your byte holds the sign, the others, the value. when the first bit is 1, the number is negative, the following bits then represent the complement of the positive value. for example:

unsigned char c;  // whose internal representation we will set at 1100 1011
c = (1 * 2^8) + (1 * 2^7) + (1 * 2^4) + (1 * 2^2) + (1 * 2^1);
cout << c;        // will give 203

                  // inversely:

char d = c;       // not unsigned
cout << d;        // will print -53
                  // as if the first is 1, d is negative, 
                  // and other bits complement of value its positive value
                  // 1100 1011  -> -(complement of 100 1011)
                  // the complement is an XOR +1   011 0101

                  // furthermore:

char e;           // whose internal representation we will set at 011 0101
e = (1 * 2^6) + (1 * 2^5) + (1 * 3^2) + (1 * 2^1);
cout << e;        // will print 53
soldari
  • 1
  • 2